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IMROruCTTCN 

Ihc Virion flashes pre informal v/orkinp papers in termed primarily 
tc etinxlate internal irterrcticr arcnp- participants in the A.I. 
Tatcratcry's Visior ard Rctotics rroup. Kary of them report 
hiyhly tentative ccnclusior.s or incomplete work. Cthers deal 
v/iti hhfhly detailed recounts of local ecuipr.ent arc prcprams 
that lack fereral interest. Still ethers sre of jP-reat 
impcrtarce, tut lacl- the polish and elat crate attention to proper 
referencing that characterizes the core formal literature. 

hevertheless, ti e Vision Flashes collectively represent the only 
documentation cf an important fraction cf the work dene in 
machine vision and rctotics. The purpose of this repcrt is to 
rs^e t; e findings more readily available, tut since they are net 
revised as presented here, readers should keep in mind the 
criminal purpose of the papers! 

ianv report on details of the k.I.T. blocks world vision systen. 
The entire spectrum of visicn processing is represerted, from lew 
level feature finding to hi^h level scene analysis requiring 
extensive v or Id kncwledg-e ard deductive pever. Cr. all levels, 
they reflect a i.ovei ent from ad hoc prcgrars ard testing toward 
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the sound theory that ore expecis from succesfull science. 

The rest recent papers shift £ ttertior away from the now well 
understood plane polyhrcn world and toward two rew foci: 

1 } r ea] wor Id v i s i c n 

2) applications for machines with vision. Careful study 
wilj shew that wor] in these new areas is productively raided ty 
the evclvin-f ideas and metaphors of artificial intelligence in 
general and "by cur earlier work in simple visual worlds* 
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AESTRACT 

This is an introduction to some of the I '.I. I. A.I. vision work of 
the last fev years. The tojics discussed are 1)l/altz'r work or 
line drswinf semantics 2) heterarchy I) the arciert learning 
tusiness and 4) coryinp scenes. All torics are discussed ir more 
detail elsewhere in vision flashes or theses. 



.his thesis was orifinally published as Vision Hash 3C 
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INTRODUCTION 

Research ir machine vision is an important activity in 
artificial intelligence laboratories for tvo major reasons: 
First, understanding vision is a worthy subject for its own sake. 
The point cf view cf artificial intelligence allows a fresh new 
lock at old questions and exposes a great deal about vision in 
general, independent cf whether ran or machine is the seeing 
agent* Second, the same problems found in understanding vision 
are of central interest in the development of a broad theory of 
intelligence* Making a machine see brings .one to grips with 
problems, like that of knowledge interaction on many levels and of 
large system organization* In vision these key issues ere 
exibited with encugh substance to be nontrivial and enough 
simplicity to be tractable. 

These objectives have led vision research at MIT to focus 
on two particular goals: learning from examples and copying from 
spare parts. Both goals are framed in terms cf a world of 
bricks, wedges, and ether simple shapes like those found in 
children's toy boxes* 

Good purposeful description is often fundamental to 
research in artificial intelligence, and learning how to do 
description constitutes a major part of our effort in vision 
research* This essay begins with, a discussion of that part of 
scene analysis known as body finding. The intention is to show 
how our understanding has evolved away from blind fumbling toward 
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sulstartive theory. 

The next section revolves around the organizational 
metaphors and the ruler, of good prcgrrmmirr practice appropriate 
for thinkirg rbout large knovledre-oriented systems. Finding 
groups of objects and using the groups to pet at the properties 
of their members illustrates concretely how some of the ideas 
about systens work out in detail. 

The topic of learning follows. Discussing learning is 
especially appropriate here not only because it is an important 
piece of artificial intelligence theory but. also because it 
illustrates a particular use for the elaborate analysis machinery 
dealt with- in the previous sections. 
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EVOLUTION CF A SEMANTIC THFCRY 
Guzman and the Body Problem 

The body finding story begins with an ad hoe but crisp 
syrtactic theory and ends ir a simple, appealing theory with 
serious semantic roots* In this the history of the body finding 
problem seems paradigmatic of vision system progress in general* 

Aldolfc Guzman started the work in this area (Guzman 
1968)* I review his program, here in order to anchor the 
discussion and show how better programs emerge through the 
interaction of observation, experiment, and theory* 

The task is simply to partition the observed regions of a 
scene into distinct bodies* In figure 1, for example, a 
reasonable program would report something like (A P C) and (D E) 
as a plausible partitioning of the five regions into, in this 
case, two bodies* Keep in mind that the program is after only 
one good, believable answer*. Many simple scenes have several 
equally justifiable interpret ions* 

Guzman's program operates on scenes in two distinct 
passes, both of which are quite straightforward. The first pass 
gathers local evidence and the second weighs that evidence and 
offers an opinion about how the regions should be grouped 
together into bodies* 

The local evidence pass uses the vertices to generate 
little pieces of evidence indicating which of the surrounding 
regions belong to the same body. These quanta of evidence are 
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Figure 1 

The task of the body finding program Is to understand how the 

regions of the scene form bodies. 
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called lirks. ' Figure 2 lists each vertex type recognised and 
shews hew each contributes to the set ef links. The arrow lirks 
always argue that the shaft-bordering regiors belong together, 
the fork mere ambitiously provides three such jinks, ore fer each 
pair of surrounding regions, ard so on. The resulting lirks for 
the scene in figure 1 are displayed superimposed en the original 
drawing in figure 3a. Internally the lirks are represented in 
list structure equivalent to the abstract dip.prs.ir in figure 3b. 
There the circles each represent the correspondingly lettered 
region freir; figure 3a. The arcs joining the. circles represent 
lirks. 

The job of pass tv/c is to combine the link evidence into 
a parsing hypothesis. How Guzman's pars two arproached its final 
form may he understood by imagining a little series of theories 
about how to use the eviderce to best advantage* Figure 3a is so 
sirple that almost any method will do. Consequently figure 4 and 
figure 5 are used to further illustrate the experimental 
observatiors behind the evolving* sequerce ef theories* 

The first theory to think about is very simple. It 
argues that any two regions belong to the same body if there is 
a link between them* The theory works fine on many scenes, 
certainly on those in figure 3a and figure 4. It is easy, 
however, to think of examples that fool this theory because it is 
far toe inclined toward enthusiastic region binding. Wherever a 
coincidence produces ar accidental link, as for example the links 
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The Guzman links for various vertex types. 



r> 



Page 11 





G> 




(a) 
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The lirks formed by the vertices of a simple scene. 
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Figure 4 

Various lirking algorithms cause this to be seen as two, three, 

or four bodies. 
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Figure 5 

A coincideneejp^|8 # 



fan incorrect linK 
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placed by the spurious psi vertex in figure 5, an error occurs in 
the direction cf tco much conglomeration. 

The probleir is corrected in theory twc. Theory two 
differ? frcr. theory one because it requires twc links for rinding 
rather than just one. By insisting on core evidence, local 
evidence anomalies? are diluted in their potential to damage the 
enc result. Such a rethod v;crks fine for figure 5, but as a 
general solution the two link scheme also falters, now on the 
side of stinginess. In figure 4, partitioning ry this second 
theory yields (A P) (C) (D) (E F). 

This stinginess can also be fixed. The first step is to 
refine theory two into theory three by iters tirg the airalgamation 
procedure. The idea is to think of previously jcined together 
region groups as subject themselves to conglomeration in the same 
way regions are joined. j^fter one pass over the links of figure 
4, we have A and F joined together. But the combination is 
linked to C by twc links, causing C to be sucked in on a second 
rur through the linking loop. Theory three then produces (ABC) 
(D) (E F) as its opinicn. 

Theory four supplements three by adding a simple special- 
case heuristic. If a region has only a single link to another 
region, they are combired. This brings figure 4 around to (A E C 
D) (E F) as the result, without re- introducing the generosity 
problem that came up in figure 5 when using theory one. That 
scene is now also correctly separated into bodies. 



Page 15 

Only one it ore refinement is necessary tc complete this 
secuence of imagined theories and bring us close tc Guzman's 
firal program* The required addition is motivated by the scenes 
like that of figure 6* There we have agair toe much linkirg as a 
result of the indicated fork vertex. Although not really wrong, 
the ore object answer seems less likely to humans than a report 
of two objects* Guzman overcame this sort of problem toward the 
end of his thesis wcrk not by augmenting still further the 
evidence weighing but rather by refining the way evidence is 
originally generated* The basic change is that all placement of 
links is subject tc inhibition by contrary evidence frcm adjacent 
vertices* In particular, no link is placed across a line if its 
other end is the barb of an arrow, a leg of an I, or a part of 
the crossbar of a T. This is enough to correctly handle the 
problem of figure 6* Adding this link inhibition idea gives us 
Guzman's program in its final form. In the first pass the 
program gathers evidence through the vertex inspired links that 
are not inhibited by adjacent vertices* In the second pass, 
these links cause binding together whenever two regions or sets 
of previously bound regions are connected by two or more links* 
It is a somewhat complex but reasonably talented program which 
usually returns the most likely partition of a scene into bodies. 

But does this program of Guzman's constitute a theory? 
If we use an informal definition which associates the idea of 
useful theory with the idea of description, then certainly 
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Fifure 6 

The fork vertex causes the two "bodies to be lirked together 

unless the offending links are .inhibited "by the adjacent arrows* 
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Guzman's work is a theory of the region parsing aspect of vision, 
either as described here or mar if es ted in Guzmen's actual machine 
program* I must hasten to say, however, that it stands 
incomplete on some of the dimensions along which the worth of a 
theory can be measured* Guzman's program was insightful and 
decisive to future developments, but as he left it, the theory 
had little of the deep semantic roots that a food theory should 
have* 

Let us ask sore questions to better understand why the 
program v/orks instead of iust hov it works* . When does it do 
well? Why? When does it stumble? Kow can it be improved? 

Experimentation with the program confirms that it works 
best or scenes composed of objects lacking holes (Winston 1971) 
and having trihedral vertices* (A vertex is trihedral when 
exactly three faces of the object Feet in three-dimensional space 
at that vertex*) 

Why should this be the case? The answer is simply that 
trihedral vertices most often project into a line drawing es L's, 
which we ignore,, and arrows and forks* which create links* The 
program succeeds whenever the weak reverse implication that 
arrows and forks come from trihedral vertices happens to be 
correct- Using the psi vertex amounts to a corollary which is 
necessary because we humans often stack things up and bury an 
arrow-fork pair in the resulting alignment. From this point of 
view, the Guzman program becomes a one-heuristic theory ir which 
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s. link is created whenever a picture vertex may have ccrae from a 
trihedral space vertex. 

Put when does the heuristic fail? Again experiments 
provide soi ething cf ar answer. The trihedral vertex heuristic 
r.;ost often fails when alignment creates rerjurous arrows. 
Without seme sort of link inhibition mechanjsm, it is easy to 
construct examples littered with bad arrows. To combat poor 
evidence, two possibilities must be explored. Cre is to demand 
more evidence, and the other is to find better evidence. The 
complexity and much cf the arbitrary quality of Guzman's work 
results from electing to use more evidence. But using more 
evidence was not enough. Guzman was still forced to improve the 
evidence via the link inhibition heuristic. 

The startling fact discovered by Eugene Freuder is that 
lirk inhibition is enough! V.'ith some slight extensions to the 
Guzman- inhibition heuristics (Eattner 1970), complicated eviderce 
weighing is unnecessary. A program that binds with one lirk does 
about as well as mere involved ones. By going into the semantic 
justification for the generation of links, we have a better 
understanding of the body linking problem and we have a better, 
more adequate program to replace the original one. This was a 
serious step in the right direction. 
Shadows 

Continuing to tra.ee the development of MIT's scene 
understanding programs, the next topic is a sortie irto the 
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question of handling shadows. The first work at MIT en the 
subject was done by Crban (Crban 1970)* Fis purpose was to 
eliminate or erase shadows from a drawing. The approach vas 
quite Guzman-like in flavor as Crban worked empirically with 
vertices, tryirg to learn their language and discover heuristic 
clues that would help establish shadow hypotheses* He found that 
quite complex scenes could be handled through the following 
simple facts: 1) a shadow boundary often displays two or more L 
type vertices in a row 2) shadow boundaries tend to form psi type 
vertices when they intersect a straight line and 3) shadows may 
often be found by way of the L's and followed through psi's. 

Crban *s program is objectionable in the same way Guzman's 
is* Namely, it is largely empirical and lacking ir firm semantic 
roots* The ideas work in some complex scenes orly to fail in 
others. Particularly troublesome is the common situation where 
short shadow boundaries involve no L type vertices* 

After Cr ban's program, the shadow problem remaired at 
pasture for some time* The issue was avoided by placing the 
light source near the eye, thus eliminating the problem by 
eliminating the shadows* Aside from being disgusting 
aesthetically, this is a poor solution because shadows should be 
a positive help rather than a hindrance to be erased out and 
forgotten* 

Interest in shadows was reawakened in conjunction with a 
desire to use more knowledge of the three-dimensional world in 
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scene analysis. Among the obvious facts are the fcllovin.fr: 

1 ) The world of blocks and wedges has a preponderance 
cf vertical lines. Civen that a scene has a single 
distent light source, these vertical lines a]l cast 
shadows at the seme angle or the retina. Fence v/hen 
cne line is identified as a shadow, it renders ell 
ether lines at the same angle suspect. 

2) Vertical lines cast vertical shadows on vertical 
faces. 

T) Horizontal lines cast shadovs en horizontal faces 

that are parallel, to the shadow casting edges. 

A) If a shadew line emerges from a vertex, that vertex 

almost certainly touches the shadow hearing surface. 

With these facts, it is easy to thirk sbout a program 

that vculd crawl through the scene of figure 7, associating 

shadow boundaries with their parent edges as shown. One could 

ven implement somethirg, through poirt four, that wculd allow 

the system to knew that the cube in figure 7 is lying on the 

table rather than floating above it. Such a set of programs 

would be on the same level as Freuder's refinement cf Guzman's 

program with respect tc semantic flavor. V/e were in fact on the 

verge of implementing such a program when Waltz radicalized cur 

understanding cf bcth the shadow work and the old body-finding 

problem. 
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Figure 7 

Simple heuristics sllov shadow lines to "be associated with the 

edges causing them. 
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I'altz rr.d reman tic Interpretation 

This section deals wit? the enormously successful work of 
l/ajtz (V/altz 1S72a) (V'altz 197fb). Readers familiar vith either 
the work cf Huffman (Huffman 1971) cr that of Clowes (Cloves 
1971) will instantly recognize that their work is the 
considerable fcundatior on which l/altz's theory- rests. 

A line in a drawing appears "because of one or another of 
several possibilities in the physical structure: The line may be 
a shade v, it may be a crack between two ali.pred objects, it ray 
be the seam between two surfaces we see, or it may be the 
boundary between an object and whatever is in back of it. 

It is easy encugh to label a31 the lines in a drawing 
according to their particular cause ir the physical wcrld. The 
drrwinr in figure £, fcr example, shows the Huffman labels for a 
cute lying flat or the table. The plus labels represent seams 
where the observer seee both surfaces and stands on the convex 
sice cf tie surfaces vith the inside of the object lying on the 
corcave. The minus labels indicate the observer is en the 
concave side. And the arrowed lines indicate a boundary where 
the observer sees enly one of the surfaces that form the physical 
edge. 

A curious and amazing thing about such labeled line 
drawings is that only a few of the ccmbinatorially pcssible 
arangerrents cf labels around a vertex are physically possible. 
We will never see a L type vertex vith both wings labeled plus no 
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Figure 8 

Huffman latles for a cube. Plus Implies a convex edge, mirus 
implies concave, and ar arrow implies cnly one of the edge- 
forming- surfaces is* visible. 
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retter how many legal Dine drawings we expr inc. (It is presui ed 
that the otjects rre huilt of trihedral vertices and thrt the 
viewpoint is such that certain types of coincident?: 1 alignment in 
the picture demain are lacking. ) Indeed it is easy to prove that 
an enureration of all possibilities allowed by three-dimensional 
constraints includes enly six possible I vertex labelinps ?nd 
three each of the fork and errov tyres. These are shewn in 
figure 9- 

Given the constraints the world plrces or the 
arrangements of line lebels around a vertex, ore cm gc the other 
way. Instead of using knowledge of the repl physical structure 
to assign semantic labels, one car. use the known constraints on 
how a drawing can possibly be labeled to get et ar understanding 
of what the physical structure must be like. 

The vertices of a line drawing are like the pieces of a 
jigsaw puzzle in that both are limited as to how they can fit 
together. Selections for adjacent vertex labelings siirply cannot 
reouire different labels fcr the line between them. Giver this 
fact a simple search scheme car work through a drawing, assigning 
labels to vertices ss it goes, taking care that no vertex 
latelirg is assigned that is incompatible with s previous 
selection at an adjacent vertex. If the search fails without 
firdinf a compatible set of labels, then the drawing canrot 
represent a real structure. If it does find a set of labels, 
then the successful set or sets of labels vield much information 
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Figure 9 ■■■*-.,. 

Physically possible cor figurations of lines around vertices* 
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abcut the structure. 

Waltz generalized the basic ideas in two fundamental 
ways. First he expanded the set of line labels such that each 
includes much more information about the physical situation. 
Second, he devised a filtering procedure that converges en the 
possible interpretations vith lightning speed relative to a mere 
obvious depth-first search strategy. 

Waltz's labels carry information both about the cause of 
the line and about the illumiratien on the two adjacent regions. 
Figure 10 gives Waltz's eleven allowed line interpretations. The 
set includes shadows ard cracks. The regions "beside the line ere 
considered to be either illuminated, shadowed Try facing away from 
the light, or shadowed by another object. These possibilities 
suggest that the set of legal labels would irclude 11 X J X Z = 
99 entries, but a few simple facts immediately eliminate abcut 
half cf these. A ccrcave edge ray not, for example, have one 
constituent surface illuminated and the other shadowed. 

With this set cf labels, body finding is easy! The line 
lalels with arrows as part of their symbol (two, three, four, 
five, nine, ten, and eleven) indicate places where one body 
obscures another body or the table. Once Waltz's program finds a 
compatible set of line labels for a drawing, each hody is 
surrounded by line labels from the arrow class. 

To create his program, Waltz first worked cut what vertex 
configurations are possible with his set of lire labels. Figure 
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OBSCURING EDGES — OBSCURING BODY LIES TO 
RIGHT OF ARROW'S DIRECTION 



CRACKS - OBSCURING BODY LIES TO RIGHT OF 
ARROW'S DIRECTION 



6 
7 



SHADOWS -- ARROWS POINT TO SHADOWED REGION 



8 CONCAVE EDGE 
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SEPARABLE CONCAVE EDGES — OBSCURING BODY 
LIES TO RIGHT OF ARROW'S DIRECTION - 
DOUBLE ARROW INDICATES THAT THREE BODIES 
MEET ALONG THE LINE 



Figure 10 

Lire interpretations recognized by Waltz's program, 
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Figure 11 

Only a few of the combinatcrially possible labeling are 

phys ice lly possible. 
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11 fives the result • Happily the possible vertex lalelir-ps 
corstitute only a tiny fractior of the ways labels can be arrayed 
around a vertex* The number of possible vertices is large but 
not unr.anafea.bly so* 

Increasing the number of legal vertex labelings does rot 
increase the number of interpretations of typical line drawings. 
This is because a proper increase in descriptive detail strongly 
cors trains the way things may go together. Again the analogy 
with jigsaw puzzles fives an idea of what is happening: The 
shapes of pieces constrain how they may fit . together, but the 
colors give still more constraint by adding another dimension of 
comparison. 

Interestingly, the number of ways to label a fork is much 
larger than the number for an arrow. A single arrow consequently 
offers more constraint and less ambiguity than does a fork. This 
explairs why experiments with Guzman's program shewed arrows to 

ft 

be more reliable than forks as sources of good links. 

Figure 12 shows a fairly complex scene. Put with little 
effort, Waltz's program: can sort cut the shadow lines and find 
the correct number of bodies. 

V/hat I have discussed of this theory so far is but an 
hors d'oeuvre. Waltz's forthcoming doctoral dissertation has 
much to say about handlirg coincidental alignment, finding the 
approximate orientation of surfaces, and dealing with higher 
order object relations like support (Waltz 1972b). Put without 
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petting into these exciting results, I car cemmert en hew his 
work fits together with previous ideas en body finding and on 
shadows. 

First of all Waltz's program has a syntactic flavor* The 
program has a table of possible vertices ard or sore level can be 
thought to parse the scene. But it is essential to understand 
that this is a program with substantive ser antic roots • The 
table is net an amalgam of the purely ad hoc ard empirical. It 
is derived directly from arguments about how real structures can 
project onto a two dimensional drawing. The resulting label set, 
together with the program that uses it, can be thought of quite 
well es a compiled form of those arguments whereby facts about 
three-dimensional space become constraints on lines and vertices. 

In retrospect, I see Waltz's work as the culmination of a 
long effort beginning with Guzman and moving through the work of 
Or tan, Ratner y Winston, Huffman and Clowes. Each person built on 
earlier ideas and experiments, producing either a refinement, a 
reaction, cr an explanation. The net result is a tradition 
moving toward more and better ability to describe and toward more 
and better theoretical justification behind working programs. 
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SYSTEM ISSUES 
Eeterarchy 

Waltz's wcrk is pert cf understand ing how lire drawings 
corvey information about scenes. This section discusses some of 
our newer ideas about hew to get such understanding into a 
workinf system. 

At MIT the first success in copyinf a simple block 
structure from spare parts involved usinp a pass-oriented 
structure like that illustrated ir figure 13. The solid lines 
represent data flow and the dashed lines, control. The executive 
in this approach is a very simple sequence of subroutine calls, 
mostly partitioned into one module. The celling up of the action 
modules is fixed in advance and the order is indifferent to the 
peculiarities cf the scene. Each action module is charged with 
augmenting the data it receives according to its labeled 
specialty. 

This kind of organization does not wcrk well. Its 
purpose is to make a vehicle quickly for testing the modules then 
rvailable. It is often better to have one system working before 
expending too much effort in arguing about which system is best. 

From this base we have mcved toward another style of 
organization which has come to be called heterarchical (Minsky 
and Parert 1972). The concept lacks precise definition, but the 
following are some of the characteristics that we sim for: 

1. A complex system should be gcal oriented. 



r^ 



/"""N 



Page 33 



r*\ 





-E 
X 

E 
C 




T 






i 
i 

! 


f 


• • 






/\ 












SB oriented system metaphor* 



/■> 



Page 34 

Procedures at all levels should "be short and associated 
with some definite goal. Goals should normally be 
satisfied by invoking a small number of subgoaHs for 
ether procedures or by directly celling a few 
primitives* A corollary is that tie systeir should be 
top down* lor the most part nothing should be dene 
unless necessary to accomplish something at a higher 
level. 



2* 



The executive control should be distributed 



throughout the system. In a heterarchical system, the 
modules interact not like a master and slaves but more 
like a community of experts. 

?* Programmers should make as few assumptions as 
possible about the state in which the system will be 
when a procedure is called* The procedure itself 
should contain the necessary machinery to set up 
whatever conditions are required before it can do its 
30b* This is obviously of prime importance when many 
authors contribute to the system, for they should be 
able to add knowledge via new code without completely 
understanding the rest of the system. In practice this 
usually works out as a list of goals lyirg 3 ike a 
preamble near the beginning of a routine. Typically 
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these goals are satisfied by sirple refererce to the 
data ba.se, tut if net, rotes are left as to where help 
ray te fcund, in the PLABIFR (Hewitt 197?) or CCNNIVER 
style (McDerirott and Sussman 197?). 

4. The system should contain sore krowledge of itself. 
It is not encugh to think of executives and primitives. 
There should be modules that act as critics end 
complain wher sore thing- looks suspicious. Cthers must 
know how and when the primitives are likely tc fail. 
Communication among these modules should be mere 
colorful than mere flow of data and commend. It should 
include what in human discourse would be called advice, 
suggestions, remarks, complaints, criticism, questions, 
answers, lies, and conjectures. 

5. A system should have facilities for tentative 
conclusions. The system will detect mistakes as it 
goes. £ conjectured configuration ray be found to be 
unstable or the hand may be led tc grasp air. When 
this happens, we need tc know what facts in the data 
base are most problematical; we need to know how to 
try to fix things; and we need to know how far-ranging 
the consequences of a change are likely to be. 



/"■^ 



Page 36 

Graphically such a syrtem looks mere like a retwerk of 
procedures than an orderly, immutable sequence. Each procedure 
is conrected to others via potential control transfer links. In 
practice which of these links is used depends on the context in 
which the various procedures are used, the context being the 
joint product of the system and the problem undergoing analysis. 

Note particularly that this arrangement forces us to 
refine our concept of higher versus lower level routines.. Now 
programs normally thought to be low level may very well employ 
other programs considered high level. The terms no longer 
indicate the order in which a routine occurs in analysis. 
Instead a vision system procedure is high or low level according 
to the sort of data it works with. Line finders that work with 
intensity points are low level but may certainly on occasion call 
a stability tester that works with relatively high level object 
models. 
Fir in and Environment Eriven Analysis 

Our earliest MIT vision system interacted only narrowly 
and in a predetermined way with its environment. The pass 
oriented structure prevents better interaction. Put we are now 
moving toward a different sort of vision system in which the 
environment controls the analysis. (This idea was prominent in 
Ernst's very early work (Ernst 1961).) 

Readers who find this idea strange should see an 
exposition of the notion by Simon (Simon 1969). He argues that 
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much of what passes as intelligent behavior is in point of fact a 
harry cooperation between unexpectedly simple algorithms and 
complex ervircnmerts. He cites the case of an ant wardering 
alcng a leach rift with ant sized obstacles. The ant's 
curvacious path might seem to be an insanely complex ritual to 
someone locking only at a history of it traced on paper. But in 
fact the humble ant is merely trying to circumvent the beach's 
obstacles and go home. 

Watching the lccus of control of our current systew as it 
struggles with a complicated scene is like watching Simon's ant. 
The up and down, the around and tacking off, the use of this 
method then another, all seem to be mystericus at first. Eut 
like the ant's, the system's complex behavior is the product of 
simple algorithms coupled together and driver by the demands of 
the scene. The reirairder of this section discusses seme elegant 
procedures implemented by Finin which illustrate two ways in 
which the environment influences the MIT vision system (Finin 
1972). 

The vision system contains a specialist whose task is to 
determine what we call the skeleton of a brick. A skeleton 
consists of a set of three lines, one lying along each of the 
three axes (Finin 1972). Each of the lines in a skeleton must be 
complete and unobscured so that the dimensions of the brick in 
question may be determined. Figure 14 shows some cf the 
skeletons found in various situations by this module. 
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Figure 14 

Some skeletons found for tricks • 
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The only problem with the program lie.* in the fact that 
cor.plete skeletons are moderately rare in practice because of 
heavy obscuring. Ever in the siirple arch ir figure 15e, one 
object, the left side support, cannot be fully analyzed, lacking 
as it does a completely exposed lire ir the depth dimensior. Tut 
humans have no trouble circumventing this difficulty. Indeed, it 
generally does not even occur to us that there is a problem 
because we so naturally assume that the right and left supports 
have the same dimensions. At this point let us look st the 
system's internal discourse when working en this scene to better 
understand how a group - hypothesize - criticize cycle typically 
works out: 

Let me see, what ere A's dimensions. First I must identify 
a skeleton. Cops! We can only get a partial skeleton, two 
complete lines are there, but only a partial line along the 
third brick axis. This means I know two dimensions but I 
have only a lower bound on the third.. Let me see if A is 
part of some group. Oh yes, A ard B both support C so they 
form a group of a sort. Let me therefore hypothesize that A 
and E are the same and run through my check list to see if 
there is any reason to doubt that. 

Are A ard B the same sort of objects? 
Yes, Both are bricks. 

Are they both oriented the same way? 
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In one case, A's depth is extrapolated froE E's. In the other no 
..i& hypothesis can he confirmed. / 
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Yes, thst checks out toe. 
Well, do the observable dimensions iratch? 

Indeed, 
Is there any reasor to believe the urobservable 
dimension cf A is different from its enalcgue on 
E? 

No. 
OK. Fvery thing- seems all right. I will 
tentatively accept the hypothesis ard proceed. 

Through this internal dialogue, the machine succeeds in 
firding all the necessary dimensions for the obscured support in 
figure 15a. Figure 15b shows how the conflict search can fail at 
the very last step. 

Grouping amounts, cf course, to using a great deal of 
context in scene analysis. We have discussed how the system uses 
groups to hypothesize properties for the group's members and we 
should add that the formation of a group is in itself a matter 
hypothesis followed by a search for evidence conflicting with the 
hypothesis. The system now forms group hypotheses from the 
following configurations, roughly in order of grouping strength: 

1. Stacks or rows of objects connected by chains of 
support or in-frcnt-of relations. 

2. Objects that serve the same function such ss the 
sides of an arch or the legs of s table. 



r^. 



Page 42 

Z* Objects that are close together. 
4* Objects that are of the same type* 

To test the validity of these hypotheses, the machine 
makes tests of good membership on the individual elements. It 
basically performs corformity tests* throwing out arything too 
unusual* There is a preliminary theory of hov this can be done 
sensibly (Winston 1971)* The basic feature of Wins ten's theory 
is that it involves not only a measure cf how distant a 
particular element is from the norm, but also of how much 
deviation from the norm is typical and thus acceptable* 

Note that this hypothesis-rooted theory is much different 
from Gestaltist notions of good groups emerging magically from 
the set cf all possible groups* Critics of artificial 
intelligence correctly point out the computational implausibility 
of considering all possible groups but somehow fail to see the 
alternative of using clues to hypothesize a limited number of 
good candidate groups. 

Naturally all of these group - hypothesize ~ criticize 
efforts are less likely to work out than are programs which 
operate through direct observation* It is therefore good to 
leave data base notes relating facts both to their degree of 
certainty and to the programs that found them* Thus an assertion 
that says a particular brick has such and such a size may well 
have other assertions describing it as only probable, conjectured 
from the dimensions of a related brick, and owing the discovered 
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rerlticnship to a particular grouping program. Using such 
knowledge is as yet orly planned, but in preparation we try to 
refrair from using mere than one method in a single program. 
This makes it easy to descrite hew a particular assertion was 
made by simply noting the name of the program that made it. 

Visual observe tior of movement provides another way the 
envircrment can influence and control what a visior system thinks 
about* One of the first successful projects was executed at 
Stanford (V/ickman 1967). The purpose was tc align two tricks, 
one atcp the other* The method essentially required the complete 
construction of a line drawing with subsequent determination of 
relative pesition. The Japanese have used a similar gpprcach in 
placing a block inside a box. 

The MIT entry into this ares, is a little different. We 
dc not recuire complete recomputation of a scene, as did the 
Stanford system* The protlem is to check the position of a just 
placed object to be sure it lies within some tolerance of the 
assigned place for it. (In our arm errors in placement may 
occasionally be on the order of 1/2".) 

Rather than recompute a line drawing of the scene to find 
the object's coordinates, we use our model cf where the object 
shculd be tc direct the eye to selected key regions. In brief, 
what happens is as follows: 

1* The three-dimensioral coordinates for selected 
vertices are determined for the object whose pesition 
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is tc be checked* 

2. Ther the supposed local ions of these vertices on 
the eye'?? retina are easily computed* 

Z. / vertex search usirif circular scans around each of 

these supposed vertex positions hill climbs to a set of 

actual coordinates for the vertices on the retina 

(V/inston and Lerman 1972). Revised three-dimensional 

coordinates can be determined from these retinal 

coordinates, given the altitude of the object* 

4* Comparing the object 's real and supposed 

coordinates gives a correction which is then effected 

by a gentle, wrist-dominated arm action* 

The vertex-locating program tries to avoid vertices that 

form alignments with those of other objects already in place. 

This considerably simplifies the work of the vertex finder* With 

a bit more work, the program could be made to avoid vertices 

obscured by the hand, thus allowing performance of the feedback 

operation more dynamically, without withdrawing the hand* 
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LE/RNUG TC IDFkTIFY TOY ETCCX STRUCTUFFS 
legrnirr 

This section describes a working com.-ruter prewar which 
emtodies a new thecry cf learning (Winston 197C). I telieve it 
is unlike previous theories because its lasic idea is to 
understand how concepts can be learned from a few judiciously 
selected examples. The sequence in Figure 16, fcr example, 
generates in the machine an idea of the arch sufficient to handle 
correctly all the configurations in figure 17 in spite of severe 
rotations, size chanfes, proportion changes and changes in 
viewing angle. 

Although no previous theory in the artificial 
intelligence, psychology, or other literatures can completely 
account fcr anything like this competence, the hasic ideas are 
quite simple: 

1. If you want to teach a concert, you must first be 
sure your student, men or machine, can build 
descriptions adequate to represent that concept. 

2. If you want to teach a concept, you should use 
samples which are a kind of non-example. 

The first point on description should be clear. At some 
level we must have an adequate set of primitive concepts and 
relations cut cf which we can assemble interesting concepts at 
the next higher level which in turn become the primitives for 
concepts at a still higher level. The operation cf the learning 
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An arch training sequence. 
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Structures recognized as arches. 
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program depends completely on the power of the analysis programs 
described in the previous sections. 

But what is meant "by the second claim that one must show 
the machine net just examples of concepts but something else? 
First cf all, something else means something which is close to 
"being an example but fails to be admissable by way cf ore or a 
few crucial deficiencies, I call these samples near-misses* My 
view is that they are more important to learning than eyamples 
and they provide just the right information to teach the machine 
directly, via a few samples, rather than, laboriously and 
uncertainly through many samples in some kird of reinforcement 
mode. 

The purpose of this learning process is to create in the 
machine whatever is needed to identify instances of learned 
concepts. This leads directly to the notion cf a model. To be 
precise, I use the term as follows: 

A model is a proper description augmented by 

information about which elements of the description ere 

essential ard by information about what, if anything, 

must not be present in examples of the concept ♦ 

The description must be a proper description because the 

descriptive language — the possible relations — must naturally 

be appropriate to the definitions expected, Tor this reason one 

cannot build a model on top cf a data base that describes the 

scene in terms of only vertex coordinates, for such a description 
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is on too lew a level. For can one luile a model or tor of a 
higher level description that cortairs only eoler irforiraticn, 
for example, because that information is usually irrelevant to 
the concept in question. 

The key pert cf the definition of model is the idea that 
sore elements of the description must he underlined as 
particularly important.. Figure 18 shows a traininr sequence that 
conveys the idea of the pedestal. The first step is to show the 
machine a sample cf the concept to be learned. From a line 
drawing, the scene analysis routines produce a hierarchical 
symbolic description which carries the same sort of information 
about a scene that a human uses and understands. Flocks are 
described as bricks cr wedges, as standing or lying, and as 
related to others by relations like in-frort-of or supports. 

This description resides in the data base in the fcrm of 
list structures,, but I present it here as a network of nodes and 
pointers, the nodes representing objects and the pointers 
representing relations between them. See figure 19 where a 
pedestal network is shown. In this case, there are reletively 
few things in the net: just a node representing the scene as a 
whole and two more for the objects. These are related to each 
other by the supported-by pointer and to the general knowledge of 
the net via pointers like is-a, denoting set membership, ard has- 
posture, which leads iF one case to standing and in the other to 
lying. 
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Figure 19 

A pedestal description* 
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Now in the pedestal, the support relation is essential — 
there is no pedestal without it* Similarly the posture and 
identity of the hoard and trick must te correct* Therefore, the 
elective in a teaching sequence is to somehow convey to the 
machine the essential, emphatic quality of those features (Later 
on we will see further examples where some relations tecome less 
essential and ethers are forbidden). 

Returning to figure 18 note that the second sample is a 
near-miss in which nothing has changed except that the board no 
longer rests on the standing trick* This is reflected in the 
description by the absence of a supported-by pointer* It is a 
simple matter for a description comparison program to detect this 
missing relation as the only difference between this description 
and the original one which was an admissatle instance* The 
machine can only conclude, as we would, that the loss of this 
relation explains why the near-miss fails to qualify as a 
pedestal* This being the case, the proper action is clear* The 
machine makes a note that the supported-by relation is essential 
by replacing the original pointer with must-be-supported-ty. 
Again note that this point is conveyed directly by a single 
drawing, not by a statistical inference from a boring hoard of 
trials* Note further that this information is quite high level* 
It will be discerned in scenes as long as the descriptive 
routines have the power to analyze that scene* Thus we need not 
be as concerned about the simple changes that incapacitate older, 
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lov/er level learning ideas. Potations, size dilutions ard the 
like ere easily handled, riven the descriptive pcwer we rave in 
operating programs. 

Continuing now with our example, the teacher proceeds to 
basically strengthen the other relations according to whatever 
prejudices he has* In this sequence the teacher has chosen to 
reinforce the pointers which determine thet the support is 
standing and the pointers which similarly determine thrt the 
supported object is a lying board. Figure 20 shows the model 
resulting. 

Now that the basic idea is clear, the slightly mere 
complex arch sequence will bring out some further points. The 
first sample, shown back in Figure 16 is an example, ss elways. 
Frcm it we generate an initial description as befcre. The next 
step is similar to the one taken with the pedestal in that the 
teecher presents a near-miss with the supported object now 
removed and resting on the table. But this time net one, but two 
differences are noticed in the corresponding description networks 
as now there are two missing supported-by pointers. 

This opens up the big question of what is to be done when 
more than one relationship can explain why the near-miss misses. 
What is needed •, of course j is g theory of how to sort cut 
observed differences sc that the most impcrtart ard most likely 
to be responsible difference can be hypothesized end reacted to. 

The theory itself is somewhat detailed, but it is the 
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explore tior of this detail through writing end experimenting with 
programs that gives the overall theory a crisp substance. 
Repeated cycles of refinement ard testing of a theory, as 
embodied in e program, is an important pert cf an erergin? 
artificial intelligence methodology. 

Nov the results of this approach en the difference 
ranking module itself include the following points: 

First of all, if two differerces are observed which are 
of the same nature ard description, ther they ere assumed to 
contribute jointly to the failure of the near-miss ard bcth are 
acted cm This handles the arch case where twe support relations 
were observed to be absent in the near-miss. Since the 
differences are both of the missirg pcinter type end since bcth 
involve the same su P pcrted-by relation, it is deemed 
heuristicaHly sound to handle them both together as a unit. 

Secondly, differences are ranked ir order of their 
distance from the origin of the net. Thus a difference observed 
in the relationship of two objects is considered more important 
than a charge in the shape of an object's face, which in turn is 
interpreted as more important than an obscured vertex. 

Thirdly, differences at the same level are ranked 
according to type* In the current implementation, differerces of 
the missirg pointer type are ranked ahead of those where a 
pointer is added in the near-miss* This is reesonable since 
droppirg a pcinter to make a rear-miss may well force the 
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introduction of a nev pointer. Indeed we have ignored the 
introduction of a support pointer between the lying brick and the 
table because the difference resulting from this new poirter is 
inferior to the difference resulting from the missing pointer* 
Finally, if two differences are found of the same type on the 
same level, then some secondary heuristics are used to try to 
sort them out* Support relations, for example, make more 
important differences than one expects from touch or left-right 
pointers* 

Now these factors constitute only a theory of hypothesis 
formation* The theory does make mistakes, especially if the 
teacher is poor* I will return to this problem after completing 
the tour through the arch example* Recall that the machine 
learned the importance of the support relations* In the next 
step it learns, somewhat indirectly,- about the hole* This is 
conveyed through the near-miss with the two side supports 
touching* Now the theory of most important differences reports 
that two nev; touch pointers are present in the near-miss, 
symmetrically indicating that the side supports have moved 
together* Here surely the reasonable conclusion is that the new 
pointers have fouled the concept*. The model is therefore refined 
to have must«-not~ touch pointers between the nodes of the side 
supports* This dissuades identification programs, later 
described, from ever reporting an arch if such a forbidden 
relation is in fact present. 
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It is row clear hov crucial irforraticn of the negative 
sort is introduced into models- They can cortair net orly 
information about v.hat is essential but also information about 
what sorts of characteristics prevent a sample from being 
associated with the modeled corcept. 

So far I have shown examples of emphatic relations, both 
of the must-be and must-rot-be type &s introduced by near-miss 
samples. The following is an example of the inductive 
generalization introduced by the sample with the lying brick 
replaced by a wedge. Whether to call this a kind of arch or 
report it as a near-miss depends on the taste of the machine's 
instructor, of course. Let us explore the consequence of 
introducing it as an example, rather than a near-miss. 

In terms cf the description network cemparisor, the 
machine finds an is-a pointer moved over from brick to wedge. 
There are, given this observation, a variety of things to do. 
The simplest is to take the most conservative stance and form a 
new class, that of the brick or wedge, a kind cf superset. 

To see what other options are available, look in figure 
21 at the descriptions of brick and wedge and the portion of the 
general knowledge net that relates them together. There various 
sets are linked together by the a-kind-cf relationship. From 
this diagram we see that our first choice was a conservative 
point on a spectrum whese ether end suggests that we move the is- 
a poirter over to object, object being the most distant 
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intersection cf a-kinc-of relations. We chcose a conservative 
position ard fix the is-a pointer to the closest observed 
intersection, in this case right-prism. 

Again e hypothesis has to he mp.de, and the hypothesis may 
well he wrong. In this case it is a question of difference 
interpretation rather than the question of sorting out the 
correct difference from many, hut the effect is the same. There 
simply must be mechanisms for detecting errors end correcting 
them. 

Errors are detected when sn example refutes a previously 
made assumption. If the first scene of Figure 22 is reported as 
an example of concept X while the second is given as a near-miss, 
the natural interpretation is that an X must he standing. But an 
alternate interpretation, considered secondary hy the ranking 
program, is that an X must not be lying. If a shrewd teacher 
wishes to force the secondary interpretation, he need only give 
the tilted brick as ar example, for it has no stending pointer 
and thus is a contradiction to the primary hypothesis. Under 
these conditions, the system is prepared to beck up and try an 
alternative. As the alternative may also lead to trouble, the 
process of backup may iterate as a pure depth first search. Cne 
could do better by devising a little theory that would back up 
more intelligently to the decision most likely tb have caused the 
error. 

I mentioned just now the role of a shrewd teacher. I 
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A trailing sequence that leads to "backup. 
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refard the dependence on e teacher as a feature cf this theory. 
Toe often in the pest history cf mechire learning theory the use 
of a teacher was considered cheating and irechenisrs were instead 
expected to self-organize their wry tc understanding by way of 
evolutionary trial ard error, or reinforcement, or whetever. 
This ignores the very real fact that humans as well as machines 
learn very little without gcod teaching. The first attempt 
should he to understand the kind of learning that is at once the 
most common and the most useful. 

It is clear that the system assimilates rew models from 
the teacher and it is in fact dependent on good teaching, hut it 
depends fundamentally on its own good judgement and previously 
learned ideas to understand and disentangle whet the teacher has 
in mind* It must itself deduce what are the salient ideas in the 
training sequence and it must itself decide on an augmentation of 
the model which captures those ideas. By carefully limiting the 
teacher to the presentation of a sequence of samples, low level 
rote learning questions are avoided while allowing study of the 
issues which underly all sorts of meaningful learning, including 
interesting forms of direct telling. 
Identification 

Having developed the theory of learning irodels, I shall 
say a little about using them in identification. Since this 
subject is both tangential to the main thrust and documented 
elsewhere (Winston 1970), I shall merely give the highlights 
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here. 

To begin with, identification is dore in a variety of 
rr odes, our system already exhibiting" the following three: 

1. We may present a scene and ask the system to 
identify it* 

2* We may present a scene with several concepts 

represented and ask the system to identify all of them* 

Z. We may ask if a given scene contains an instance of 

something* 

Of course, the first mode of identifying a whole scene is 

the easiest* We' simply insist that 1) all models must-be- type 

pointers are present in the scene's description and 2) all the 

models must-not-be-type pointers must not be present* For 

further refinement, we look at all other differences between the 

model and scene of other than the emphatic variety and judge the 

firmness of model identification according to their number and 

tyr e * 

When a scene contains many identifiable rows, stacks, or 
other groups, we must modify the identification program to allow 
for the possibility that essential relations may be missing 
because of obscuring objects. The properties of rows and stacks 
tend to propagate from the most observable member vnleES there is 
contrary evidence* 

The last task, that of searching a scene for a particular 
concept is a wide open question* The method now is to simply 
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feed cur retwcrk matching profram both the model and the larper 
network and hcpe for the best. If some objects are matched 
' gsinst corresponding parts of the model, their pointers tc other 
extraneous objects are forrotten, and. the identification routine 
is applied. I-'uch remeins to he dene elonf the lines of ^uidiru* 
the match contextually to the ri^ht part of the scene. 
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COPYING TOY ELCCK STRUCTURES 

I here give a brief description cf the system's higher 
level functions along with a scenario giving their interaction in 
a very simple situation* The main purpose is to illustrate the 
top down, goal oriented and environment dependent flavor of the 
system. Cede samples are available elsewhere (Winston 1971) 

Figure 23 shows the possible call paths between seme of 
the programs* Note in particular the network quality that 
distinguishes the system from the earlier r,ess oriented metaphor* 

Clarity requires that only a portion of the system be 
described* In particular, the diagram and the discussion omits 
the following: 

1 ) A large number of antecedent and erasing programs 
which keep the blocks world model up to date* 

2) A large network of programs which find skeletons and 
locate lines with particular characteristics* 

Z) A large retwork of programs that uses the group - 
hypothesize - criticize idea to find otherwise 
inaccessible properties of hidden objects. 
4) A network of programs that jiggles an object if the 
arm errs too much when placing it* 
The Functions 

COPY 
As Figure 23 shows, CCPY simply activates programs that 
handle the two phases of a copying problem; namely* it caHls for 
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the spare parts tc "be fourd £nd put away into the srare parts 
warehouse area, and it initiates the replication of the rev 
scene* 

STOFE-P/RTS 
To disassemble a. scene and store it, STOFE-P/RTS loops 
through a series of operations* It calls appropriate routines 
for selecting an object, finding a place for it, and for enacting 
the movement to storage . 

CHCOSE-T0-EEM0VE 
The first body examined by CHOCSE-TO-REf-OVE comes 
directly from a successful effort to amalgamate some regions into 
a tody using FINP-NEW-FCDY. After some body is created, CHOOSE- 
TC-REMCVE uses FIND-FFLOV to make sure it is not underneath 
something* Frequently, some of the regions surrounding a newly 
found body are not yet connected to bodies, so FIND-PELOV has a 
request link to BIND-RFGIOK* (The bodies so found, of course, 
are placed in the data base and are l^ter selected by CHOCSE-TO- 
REMOVE without appeal to FINIVNEV-.PODY- ) 

FIND-NEV-EODY 
FIED-NEV/-BODY locates some unattached region and sets 
BIKD-RFGION to work on it. BIND-REGICN then calls collection of 
programs by Eugene Freuder which do a local parse and make 
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assertions cf the form: 

(R17 IF-A-FACF-CF F2) 

(B2 IS- A ECDY) 
These programs appeal to a complicated network of subrcutires 
thet drive line finding and vertex finding primitives erourd the 
scene looking for complete regions (Wirston 1972). 

FINF-BEICW 

As mentioned, some regions may reed parsing before it 
makes sense to ask if a given object is below .something. After 
assuring itself that en adjacent region is attached to e body, 
FIFD-EFLOW calls the FIND-ABC VE programs to do the wcrk of 
determining if the bcdy originally in question lies below the 
object owning that adjacent region. 

FIND-AEOVE-1 end FIND-ABOVF-2 and FINr-ABCVE-3 

The heuristics implemented in Winston's tresis (V'inston 
1970) and many of these only proposed there are now working in 
the FIFD- ABOVE programs. They neturelly have a collection of 
subordinate programs and a link to EIND-RECION for use in the 
rvent an unbodied region is encountered. The essertions made are 
of the form: 

(B3 IS-ABOVF B7) 

MOVE 
To move an object to its spare parts position, the 
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locations and dimensions are gathered up*. Then MANIPULATE 
interfaces to the machine language programs driving the arm. 
After MOVE succeeds, STORE-PARTS makes an assertion of the form: 

(B12 IS-A SPAREPART) 

FIND-TOP 

The first task in making the location calculations is to 

identify line-drawing coordinates of a block's top. Then FTND- 

TALLNESS and FIND-ALTITUDE supply other information needed to 

properly supply the routine that transforms line-drawing 

coordinates to X Y Z coordinates. Resulting assertions are: 

(B1 HAS-DIMENSIONS (2.2 3.1 1.7)) 

(B1 IS-AT (47-0 -17.0 5.2 .3)) 

Where the number lists are of the form: 

(< smaller x-y plane dimension > 
< larger > 
<tallness>) 

(< x coordinate > <y> <z> <angle>) 
The x y z coordinates are those of the center of the bottom of 
the brick and the angle is that of the long x-y plane axis of the 
brick with respect to the x axis. Two auxiliary programs make 
.the following assertions wherever appropriate: 

f STANDING) 

(B12 HAS-POSTURE j LYING J ) 
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(B7 IS-A 




FIND-DIMENSIONS 
This program uses FIND-TOP to get the information 
necessary to convert drawing coordinates to three-dimensional 
coordinates. If the top is totally obscured, then it appeals 
instead to FIND-BOTTOM and FIND-TALLNESS-2. 

SKELETON 
SKELETON identifies connected sets of 3 lines which 
define the dimensions of a brick (Finin 1971) (Finin 1972). It 
and the programs under it are frequently called to find instances 
of various types of lines. 

FIND-TALLNESS-1 
Determining the tallness of a brick requires observation 
of a complete vertical line belonging to it. FIND-TALLNESS-1 
uses some of SKELETON'S repertoire of subroutines to find a good 
vertical. To convert from two-dimensional to three-dimensional 
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coordinates, the altitude of the brick must also be known. 

FIND-TALLNESS-2 
Another program for tallness looks upward rather than 
downward. It assumes the altitude of a block can be found but no 
complete vertical line is present which would give the tallness. 
It tries to find the altitude of a block above the one in 
question by touching it with the hand. Subtracting the altitude 
of the lower block from that of the higher gives the desired 
tallness. 

FIND-ALTITUDE 
FIND-ALTITUDE determines the height of an object's base 
primarily by finding its supporting object or objects. If 
necessary, it will use the arm to try to touch the object's top 
and then subtract its tallness. 

FIND-SUPPORTS 
This subroutine uses FIND-SUPPORT-CANDIDATES to collect 
together those objects that nay possibly be supports. FIND- 
SUPPORT-CANDIDATES decides that a candidate is in fact a support 
if its top is known to be as high as that of any other support 
candidate. If the height of a candidate's top is unknown but a 
lower bound on that height equals the height of known supports, 
then ADD-TO-SUPPORTS judges it also to be a valid support. At 
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the moment the system has no understanding of gravity. 

FIND-STORAGE 
Once an object is chosen for removal, FIND-STORAGE checks 
the warehouse area for an appropriate place to put it. 

MAKF-COPY 
To make the copy, MAKE-COPY, CHOOSE-TO-PLACF, and FIND- 
PART replace STORE-PARTS, CHOOSF-TO-REMOVE and FIND-STORAGE. 
Assertions of the form: 

(B12 IS-A SPAREPART) 
(B2 IS-A-PART-OF COPY) 
(B2 IS-ABOVE E1) 
are kept up to date throughout ty appropriate routines. 

CHOOSE-TO-PLACF 
Objects are placed after it is insured that their 
supports are already placed. 

FIND-PART 
The part to be used from the warehouse is selected so as 
to minimize the difference in dimensions of the matched objects. 
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A Scenerio 

In what follows the scere in figure 24a provides the 
spare parts which first must be put away in the warehouse. The 
scene to he copied is that of Figure 24t. 

COPY 

COPY begins the activities. 
STORE-PARTS 

STORE-PARTS begins supervision of disassembly.. 

CHCCSE-TO-REMOVE 
FIND-NEW-EODY 
BIND-REGION 

CHCCSE-TO-REMOVE parses a few regions together into a 

body, B1. A great deal of work goes into finding these regions 

by intelligent driving of low level line and vertex finding 

primitives. 

FIND-BELOW 
BIND-REGION 
FIND-ABOVE 

A check is made to insure that the body is not below 

anything. Note that B2 is parsed during this phase as required 

for the FIND-ABOVE routines. Unfortunately B1 is below B2 and 

therefore CHOOSE-TO-REMOVE must select an alternative for 

removal. 

FIND-BELOW 
FIND-ABOVE 

B2 was found while checking out B1. CHCOSE-TO-REMOVE now 

notices it in the data base and confirms that it is not below 

anything. 
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Figure 24 ~"*8b&" ** ' " 

A source of spare parts and a scene to be copied 
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FIND-STORAGE 

FIND-STORAGE finds an empty spot in the warehouse. 

MOVE 

MOVE initiates the work of finding the location and 

dimensions of B2. 

FIND-TOP 

FIND-ALTITUDE 
FIND-SUPPORTS 

FIND-SUPPORT-CANDIDATES 
FIND-TOP-HEIGHT 
FIND-ALTITUDE 
FIND-SUPPORTS 

FIND-SUPPORT-CANDIDATES 
FIND-TOP-HEIGHT 
FIND-TAILNESS-1 
FIND-TALLNESS-1 

FIND-BOTTOM proceeds to nail down location parameters for 
B2. As indicated by the depth of call, this requires something 
of a detour as one must first know B2*s altitude, which in turn 
requires some facts about B1. Note that no calls are made to 
FIND-ABOVE routines during this sequence as those programs 
previously were used on both B1 and B2 in determining their 
suitability for removal, 

FIND-DIMENSIONS 

A call to FIND-DIMENSIONS succeeds immediately as the 
.necessary facts for finding dimensions were already found in the 
course of finding location. Routines establish that B2 is a 
lying brick. 

MANIPULATE 

MANIPULATE executes the necessary motion. 
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CHOCSE-TO-REMOVE 

FIND-EELOW 
FIND-STORAGE 

B2 is established as appropriate for transfer to the 

warehouse. A place is found for it there. 

MOVE 

FIND-TOP 

FIND-DIMENSIONS 

MANIPULATE 

The move goes off straightforwardly, as essential facts 

are in the data base as side effects of previous calculations. 

CHOOSE-TO-REMOVE 
FIND-NEW-BODY 

No more objects are located in the scene. At this point 

the scene to be copied, figure 24, is placed in front of the eye 

^ and analysis proceeds on it. 

MAKE-COPY 

CHOOSE-TO-PLACE 

FIND-NEW-BODY 

BIND-REGION 

B3 is found. 

FIND-EELOW 
BIND-REGION 
FIND-ABOVE 



part. 



B3 is established as ready to be copied with a spare 



FIND-PART 

FIND-DIMENSIONS 
FIND-TOP 
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Before a part can be found, B3*s dimensions must be 
found. The first program, FIND-TOP, fails. 

FIND-BOTTOM 
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FIND-ALTITUDE 
FIND-SUPPORTS 

FIND-SUPPORT-CANDIDATES 

FIND-TOP-HEIGHT 

FIND-DIKENSIONS tries ar. alternative for calculating 

dimensions. It starts by finding the altitude of the bottom. 

FIND-TALLNESS-2 
FIND-SUPPORTED 
FIND-BELOW 

FIND-ABOVE 
FIND-SUPPORTS 

FIND-SUPPORT-CANDIDATFS 

FIND-TAILNESS-2 discovers B4 is above B3. 

FIND-ALTITUDE 
TOUCH-TCP 
FIND-TALLNESS-1 

FIND-ALTITUDE finds B4's altitude by using the hand to 

touch its top subtracting its tallness. B3's height is found by 

subtracting B3*s altitude from that of B4. 

MOVE 

MANIPULATE 

Moving in a spare part for E3 is now easy. B3's location 

was found while dealing with its dimensions. 

CHOOSE-TO-PLACE 

FIND-BELOW 
FIND-PART 

FIND-DIMENSIONS 
FIND-TOP 
MOVE 

MANIPULATE 

Placing a part for B4 is easy as the essential facts are 

now already in the data base. 

CHOOSE-TO-REMOVE 
FIND-NEW-EODY 
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No other parts are found in the scene to be copied. 
Success. 
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ccrcLuriNG remarks 

This essay begsn with the claiir that the. study of vision 
contributes "both to artificial intelligence arc! to a theory of 
vision. Working? v/ith s view toward these purposes has occupied 
many years of study at KIT and elsewhere on the toy world of 
simple polyhedra. The progress in semantic rooted scene 
analysis, learning, and copying have now brought us to a plateau 
where ve expect to sperd some time deciding whet the next 
important problems are and where to look for solutions. 

The complete system, which occupies on the order of 
10C,00C thirty-six hit words, is authored by direct contributions 
in code from over a dozen people. This essay has not summarized, 
but rather has only hirted at the difficulty and complexity of 
the pre biers this group has faced. Many important issues have 
not been touched on here at all. line finding, for example, is a 
task or which everything rests and has by itself occupied more 
effort than all the other work described here (Roberts 196?) 
(Kerskcvits and Binford 1970) (Griffith 1970) {Horn 1971) (Shirai 
1972). 
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AESTRACT 



This rarer describes methods which allow a rrogram to srslvze ard 
interpret a variety of scenes made up of polvhedra with trihedral 
vertices. Scenes may ccntain shadows, accidental edpe 
alignments, and some missing lires. This wcrk is based on ideas 
proposed initially by Huffman ard Clowes; I have aded methods 
which enable the program to use a number of facts about the 
physical world to constrain the possible interpretations of a 
line drawing, and have also introduced a. far richer set of 
descriptions than previous programs have used. 



This paper was originally published as Vision Flash 29. 



r*\. 



Page 82 



1.C II.TRCIUCTICN 

low co we ascertain the shares cf unfai iliar objects? 
VI.; r ex we so solder confuse slackens v/ith real thirrs? How do 
vc "factor out" shadows when locking at scenes? How are we 
rile to see the world as essentially the same whether It is a 
brirht surry day, an overcast day, or a niyht with only 
street lights for illumination? In the terms of this paper, 
hew err we recoil ze the Identity of figures 1*1 and 1.2? Do 
we use learnirf and knowledge to interpret what we see, or do 
we somehow automatically see the world as stable and 
irderc-ndert of .lighting? What portions of scenes can we 
understand from local features alone, and what con firrrat ions 
require the uce of global hypotheses? 

Various theories have "been proposed to explain how 
people extract three-dimensional information from scenes 
(Gibson 1950 is en excellent reference). It is well known 
that we £-et depth and distance inforraticn from motion 
parallax and, for objects fairly close to us, from eye focus 
feedback and parallax:. Put this does not explain how we are 
able to understand the three-dimensional nature of 
phctcrraphed. scenes. Perhaps we acquire knowledge of the 
shares cf objects by hand li nf them and roving around them, 
and use rote memory to assign shape to those objects when we 
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recoyrize ther in sceres. Put tl is dees r.ct exr]r ir how we 
err. perceive the shapes cf objects we hsve never seer before. 
Sir. ilrrly, tie fact that we car tell the shades of f rry 
clyjocts fror: rs sirpHe a representation rs a lire drawina 
shews that wc do rot reed texture or ether fine detsils to 
rrccrtcin share, theuyh we rcry cf course use texture 
rrrdifrts 9V.6 other details tc define certain edre^. 

J urcertcch this research v;:'th the belief that it is 
rcssille to discover rules with which a rrcyrar err chtain a 
t] ree-dircrsicral rcdcl of 9 scere, river only a reasonably 
rccd line drrviny of a scere. Such a prcrrar: r.iyht have 
f~\ rrrlicr tiers tcth in practical! situations and in develoriry 

hetter theories cf hur.cn vision. Irtroepectively, j do not 
feel that there is a yrcst differerce between seeing 
"reality" and seeing line drawirys. 

I.orecver, there are considerable difficulties loth in 
precessiny sterec ireyes (such as the problem cf decidinr 
which points on each retina correspond to the sarc scene 
rcint; see Cuzman 1968, Lerrar 197C) and in buildiry a 
syster incorporating hand-eye ccordinaticn which could be 
need to heir explore ard disar biyuate a scere (Cpschn'iy 
1:71). Tt seeirs tc re that vhile the use of ranye finders, 
Lultirle liyl t sources tc help, elir.inste shadows (Shirai 
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1 C 71)? anc the rertricticr of scores to hrcvr objects Fay all 
prove useful for practical robots, these approaches avoid 
ccrirf tc grirs with the nature of human rercepticr vis-a-vis 
tie implicit three-dimensional information in lire cravings 
of reel scenes. Vhile I vculd be very cautious a Tout 
claiming para lie] s between the rules in my program and human 
visual processes,, at the very least 1 have demonstrated a 
number of capable vision programs which require only fired, 
mc r ocular line drawings for their operation. 

lr\ this thesis I describe a working collection of 
ccrpui er program s which reconstruct three-dimersicnal 
descrlpticns from line drawings which are obtaired from 
scenes composed of plane- faced objects under various lighting 
conditions* In this description the system, identifies shadow 
lines and regions, groups regions which belong tc the same 
object, and notices such relations as contact or lach of 
contact between the objects, support and in- fr on t-cf /behind 
relations between the objects as well as information about 
the spacial orientation of various regions, all using the 
description it has generated. 
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1.1 DFSCRIPTIOIS 

The overall goal of the system is to provide a. precise 
description of a plausible scfne which cculd five rire tc a 
particular lire drawirg. It is therefore irpcrtart to have a 
peed Janfusre in which to describe features of scenes. Since 
I wish to have the program: operate on unfamiliar objects, the 
language rust he capable of describing such objects. The 
language I have used is an expansion of the labeling system 
developed by Huffman (Huffman 1971) an the Urited States and 
Clowes (Clowes 1971) in Great Eritain. 

The language employs labels v/hich are assigned to line 
segmerts and regions in the scene. These labels describe the 
edge geometry, the connection or lack of cornection between 
adjacent regions, the orientation of each region in three 
dimension?, and the nature of the illumiraticn for each 
region (illuminated, projected shadow region, or region 
facing away from the light source). The goal of the program 
is to assign a single label value to each line and region in 
the line drawing, except in cases where humans also find a 
feature tc be ambiguous. 
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Ibis language allows precipe definitions cf ruch 
concerts es supported, by, in front cf, behind, rests apairst, 
shadows, is shadowed "W, is capable of supporting, leans on, 
arc ethers. Thus, if it is possible to label each feature of 
a scene uniquely, then it is possible to directly extract 
these rela tiers from the description cf the scene based on 
this ]abelinp\ 
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JLTCIICN I ABETS 

luch of the program's power is based on access to lists 
of possible line ja.bel assignments for each, type of function 
ir a line drawing. While a natural IsngUFfe analog/ to these 
labels could be misleading, I think that it helps in 
explaining the basic operation cf this portion of the 
TDrorrrn. 



If we think of each possible label for a lire as a* 
letter ir the alphabet, then each junction rust be 
labeled with an ordered list of "letters" to form a 
legal IT wcrd f! in the language* Thus each "word" 
represents a physically possible interpretation for a 
fiver junction- * Furthermore, each "word" must match the 
"words" for surrounding* junctions in order to form a 
legal "phrase" r and all "phrases" in the scene rust 
agree to form a legal "sentence" for the entire scene* 
The knowledge cf the system is contained, in (1) a 
dictionary "made up cf every legal "word" for each type 
cf junction, and (2) rules by which "words" can legally 
combine with other "words". "The range of the dictionary 
entries defines the uriverse of the program; this 
universe can be expanded by adding rev; entries 
systematically tc the dictionary.- 
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In feet, the "dicticrar; " reed ret te a stored l:'st. 
Tie eicticnary can consist cf r relatively smell list of 
resettle edfe geometries for each iurcticr tyre, end r set of 
rules which ar, te used tc yenerete the complete dictiorary 
free the c rig: rial lists. Deperdirp or the emcunt of computer 
i: or cry aveiletle, it cay either be desirable to store the 
ccrrlc te jlc.tr as ccmrilec kncwledge or tc yererato tie lists 
vhc.r they are r.eeded. Ir r.y current prcyrar the lists are 
fer tic most rsrt precompiled. 

"he corpositicn cf the dictionary is interesting in its 
_^ cvr riyht. \ hile sore tasic edre geometries rive rise tc 
mrrp dictionrry entries, sore rive rise to very fev. The 
total ruirTer cf entries sharirp the seme edye reoretr: car he 
es lo\ as three for seme ARF.CW junctions, ircludirm shedow 
edges, while the numter generated by some FCFK jurction edge 
yecmetries is over 27C,00C (ircludir.g reyion crierteticn an* 
illumination values). 
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1.3 JlhCTICK FAEFF ASSIGN! FHT 

There is a considerable arourt of lecal information 
which can be u^ed tc select a suhset of the total nurber of 
dictionary entries which are consistent with r- rarticular 
junction. The first riece of inf crmation is already included, 
iiplicitly in the idea of Junction tyre. Junctions a.re typed 
scccrdirp tc the nurber of lines which make up the Junction 
are the two dimensional srrarpemcrt of these lines. Figure 
1.;* shews al] the junction types which car occur in the 
universe of the program. The dictionary is arranged by 
junction type, and a standard ordering is sssirn.ee to all the 
line segments which rake ' up junctions (except FOFKS and 
11JFTIF). 

The program, can also. use local region briph tress and 
line sepmert directior to preclude the assignment of certain 
labels to lines. For example, if it knows that one repion is 
brighter than an adjacent repion, then the line which 
separates the regions can be labeled as a shadow repion in 
only one way* There are other rules which relate repion 
orientation, light placement and repion illumination as well 
e^ rules whiich limit the rumber of labels which car be 
assigned to line segments which border the support surface 
fer the scene- The proprar: is able tc combire ell these 
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tyres of information in firdirr " list of arrrcprirte labels 
fcr a single function. 

1-<; CCIXIIATICl; RULES 

Combination ruler are rred to select from the Initial 
assignments the label, or labels, which correctly d escribe 
tie scene features that cculd have produced each junction in 
t- e given line drawing** The simplest type of combination 
rule : erely sister that a label Is a possible description for 
a junction if and only if there is at least one label which 
"matches" it s.ssirned to each adjacent function. Two 
junction labels "natch" if and only if the lire server t which 
joins the junctions g*ets the same interpretation from both of 
the functions at its ends*. 

Cf course, each interpretation (line label) is really a 
shorthand code for a number of properties of the line and its 
adjoining: regions. If the program can show that srr r one of 
these constituent values cannot occur In the river scene 
context j then the whole complex of values for that 3ine 
expressed implicitly in the interpretation cannot be rossible 
either and, furthermore, any junction label which assigns 
this interpretation to the line segment can be eliminated as 
well* Thus, when it chooses a label to describe a 
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particular iurcticn, it constrains ell the iuncticrs vrich 
surround the regions touching thir iurcticn, even thourh the 
ccrbiratien rules only ccr.pere adjecert junctions. 

1 ere coi; plica. tec rules are reeded if it is necessary to 
relate lurctiens vhich do not share a visible region cr 3ine 
sepuert. For erarple, I thcught at the outset of my work 
trst it right be necessary to construct irodels of hidden 
vertices cr features which freed avay from the eye in order 
tc find unique labels for the visible ' features. The 
difficulty in this is thet urless a program can find which 
lires reinsert obscuring ec>es, it cannot knew where to 
^-s ccrstruct hidden features, but if it needs the hidden 
festures to label the lines, it may not be able tc decide 
which lines represent obscuring edges. As it turns cut, no 
such conrlicated rules end constructions are necessary in 
general; most of the latelirg problem can be solved by a 
schene vhich enly compares adjacent junctions. 
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1.5 FhPERIhENIAL RESULTS 

I hen I tegar to write a program to implement the system 
I had devised, I expected, to use a tree search system to find 
which labels cr "words" ecu. Id he assignee to each junction* 
However, the number cf dictionary entries for each type of 
junction is very high, (there are almost ;C00 differert ways 
tc lahel a FOLK- ji.net ion before even considering the possible 
rosier orientations!) so I decided tc use a sort of 
"filtering program" before do: ng a fu]l tree search. 

The urogram computes the full list of dictionary entries 
for each junction in the scene, eliminates from the list 
those labels which can be precluded on the basis of local 
features, assigns each reduced list to its junction, and then 
the filtering program conputes the possible labels for each 
line, using the fact that a line label is possible if and 
only if there is at least one junction label at each em of 
the line which contains the line label* Thus, the list of 
possible labels for a line segment is the intersection of the 
two lists of possibilities computed from the junction labels 
at the ends of the line segment. If any junction label would 
assign a interpretation to the line segment which is not in 
this intersection list, then that label can be eliminated 
■frcm consideration* The filtering program uses a network 
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iteration scheme to syslemat icajly rer ove p] 1 the 
irterrretatiors vhich are precluded by the eliriraticr of 
l£telr at a particular jurcticn. 

Ihen I rsn this filterir.r program I was amazed to find 
that in the first few scenes I tried, this profrsrc fourd a 
urioue label for each line. Ever when I tried considerably 
rcre cemrlicsted scenes, there were orly a few 3ines in 
general which were not uniquely specified, and scire of these 
wore essentially ambiguous, i.e. I 'could not decide exactly 
what sort of ed^-e c&ve rise to the lire segmert myself.. The 
ether amtiguitiee, i.e. the ores which I could resclve 
uyself, ir gereral reouire that the program recognize lines 
which arc parallel or col linear or regions which meet alcnp 
mere than one lire segment, and hence require rcre global 
agreement. 

I have teen able tc use this system to investigate a 
large number cf line drawing?, ircluding ones with misring 
lines and ones with numerous accidentally aligned iurctiens. 
Prom these investigations I can say with some certainty which 
types of scene features can he handled "by the filtering 
program and which require more complicated processing. 
Whether or net rcre processing is required, the filtering 
system provides a computations Uy chear method for acauirin-^ 
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a preat deal cf information. Per eramiue, in inert scenes a 
larre percentage of the lire segments are unambiguously 
labeled, and more complicated processing car "be directed to 
tie areas which remain ambiguous. As another example, if I 
only v/ish to know which liner are shadows or which lines are 
tic outside edfes of cbiects er hew many objects there are in 
the scene, the rrcfrair may he able to get this information 
ever, thourh scire ambif~ui+ies remain, since the arbipuity may 
crly involve region illumination type or repicn crier tat ion* 

hipure 1.4 shows some of the scenes which the program is 
able to handle. The segments which remain ambiguous after 
its creration are marked with stars, and the approximate 
am: curt of tire the program requires to label each scene is 
marred below it. The con put er is a PDP-1C, and the program 
is written partially in hlCFC-PIANNER (f'ussman et al 1 c 71) 
and partially in compiled LIST. 
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1.6 CCKFAI.ISCI hlTF CPITP. Vitrei" 'ECGrAi.'S 

! y systei differs fror previously proposed cres in 
severe 1 ii. pert ant ways: 

hirst, it is pile tc Isrdle a ]~uch "broader rarre cf 
scene tyrer, thar hrve previous prcrrsrs. The rrorrar 
"rrdersterds" shadows, .sere iurctiors which hrve irisrinp 
lines, arc" arparer.t alirnrrrt cf edres caused iy the 
particular plscer.crt cf the eye with respect to the scene, so 
that no special effort needs to Tre r.ade tc avoid prcilematic 
features. 

teccrd, the desipn of the prcrrair facilitates its 
irteyiaticn with line-findirp propreirs and hirher-level 
rrcfrar.s such as prcprar.s which deal with natural lan.ruaae or 
overall system peals. The system can he used to write a 
prcrrei:: which automatically requests and uses ra.ny different 
tyres of information tc find the possible interpretations for 
a sinple feature cr pcrticn of a scene. 

Third, the program is ahle to deal with aEbipuity in a 
natural manner. Scr.e features in a scene can he ambiruous to 
a person leckirp at the sare scene and the prcarar rreserves 
tl ese various possihlities. This tolerarce for arhiruitv is 
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central to the philosophy of the rrogrsm; rather ther tr-iiv" 
tc rick the "most potable" interpret aticr cf any features, 
tie rrcgrEr. operates by trying tc elimirste inroRsible 
irterrreta tiors. If it has been given insufficient 
irfcrraticn te decide cr a unique roasibility, then it 
preserves all the active possibilities it knows.. Of course 
if a sinple irter:retatior is reeuired fcr scire reason, one 
car be chesen frcr this list ry. heuristic rules. 

Fourth, the program is algorithmic and does not require 
facilities for lack-up if the filter proprar finds an 
ac^ou? te description. heuristics have been used in all 
previous vision programs tc approximate reality Try the rost 
likely interpretation. This may simplify sore problems, but 
sophisticated proprams are needed to ratch up the cases where 
the approximation is wrong; in my program I have used as 
complete a description as. I cculd devise with the result that 
the programs are particularly simple, transparent and 
powerful. 

Fifth, because cf this simplicity, I have been able to 
write a program which operates very rapidly. As a practical 
matter this is very useful for debugging the system, and 
allows modifications tc he made with, relative ease. 
Moreover, because of its speed, I have been able to test the 
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program en many separate lire drawings end have thus 1 een 
alle to gain a clearer understanding cf tie capabilities and 
ultimate limitations of the rrogram. In turn, this 
understanding has lee and should cortinue tc lead to useful 
modifications and s .creator understand inr of the nctire and 
complexity of procedures receesary to handle various types of 
scene features. 

Sixth, as exrleired in the next section, the descriptive 
larguage provides a thecretical foundation of considerable 
value in expleirirg previcus vork. 

^ 1.7 ETSTOFICAJ PEFSPECTIVE 

Cne cf the Treat values of the extensive descriptive 
apparatus I have developed is its ability to explain the 
nature and shortcomings cf rast work. I will discuss in 
Chapter 9 how my system heirs in urder standi rig the v/orl of 
Guzman (Guzman 196£) r Eattner (Eattner 197C), Huffman 
(Euffman 1971), Clowes (Clowes 1971), and Crban (Crbar 1970); 
and tc exrlain portions of the work of V/inston (Winston 1970) 
and Finin (Finin 1971a, 1971b). For example, I show how 
various concepts such as support can he formalized ir my 
descriptive language. Frcm this historical comparison 
erergesa striking demonstration of the alility cf good 
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descriptions to loth broaden the range of applicability cf a, 
program, and simplify the prof 'ran structure, 

1.8 Ii PLICATIChS ITR FUKA! PEFCEPTION 

i-y belief that the rules v/hich govern the interpretation 
cf a Qine drawing* should he simple is based or the subjective 
ihrrersior that little abstraction or processing cf ary type 
seers to be required for me to be able tc recognize the 
shadows,, object eager, etc. in such a drawing, ir cases 
where the drawing is reasonably simple and complete*. I do 
net lelieve that human perceptual processes necessarily 
resemble the processes in my program, but there are various 
aspects of my solution which appeal tc my intuition about the 
nature of that portion cf the problem which is independent of 
the type cf perceiver- I think it is significant that my 
program is as simple as it is, and that the information 
stored in it is so independent of particular objects* Back- 
up is not necessary in general; the system works for picture 
fragments as well as for entire scenes; the processing time 
required is proportional to the number of line segmerts and 
net an exponential function cf the number; all these facts 
lead me tc believe that my research has been in the right 
directions. 
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2.C OfiCK SYhCPSIf 

Ihis charter provides r crick loo]- at sere of the 
technical aspects cf ry verb. It provides a synorsis of vcrk 
covered mere fully in my thesis (/.I. TR-271). 

2.1 TIT PrCELTi; 

7r vhat fellows I frequently male a distinction between 
the score itrelf (otiects, table, and shrdovs) rnd the 
r( tirrj reprerentrticr of the scere as a tve-di merrier pi line 
drrwirp. I wilj use the terrs vertex, edye and surface to 
refer tc the scene features which cap intc jure tier, line and 
rc.yicr rerpeel ive]y ir the lire drawirp. 

Cur first srbprcbler is tc develop a larruape that 
allcvr u.s to relrte these two worlds. I have dene this "by 
aesiprinp names called labels to lines ir the line drawinp, 
after the r.anrer cf Huffman (huffnan 1$71) and Cloves (Cloves 
1S71). Thus, for example, :n fipure 2.1 lire sepmert J1-J2 
is labeled as a shadow ed^e, line J2-J; is labeled as a 
concave edge, lire J;-JV is labeled as a cenvex edpe, line 
o T ^-J5 is labeled, ss en cbscurinp edpe ard line J12-J13 is 
Irbeled as a crack edpe. Thus, these terms are attached to 
rents cf the drawinp, but they designate the hinds of things 
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fcurc ir. the three-di: ersicra" cere. 

'hen we loo]- at a lire drrwiry ef t] ir rcrt, re usurlly 
err ctsily understand what thf lire drawirr rerrererts. Ir 
terrs cf «■? lrhelinr sche: e either (1) v/r pre sb]e t.c assim 
Irlelr uniquely tc each ljre, cr (2) re err rry thrt ro such 
rcrre cculd exist, cr (;) re err. rpy thet although it is 
irrcF-sitlr tc decide unarchiyucusly wh?t the Irtel cf m fd-e 
e- culd he, jt rust he Ipteled with ere uerher of f one 
rr coif led suTset cf the total nurter of lahels.. Vhat 
Wcrlcdfe is reeded tc err tie the prcrrrr. te reproduce ruch 
If hellry r ss if ruerte? 

ruff j. sr rrd Cloves provided a partirl pnswer ir their 
rrrerr. They jointed cut that each type of iurctior can 
crly le Irheled ir a few ways, ard that if re err pfv with 
certainty whet the Istel of one particular line is, we can 
rreatly constrain all other lines which intersect that ]ine 
seniert at its ends. As e specific eysmple, if ore hranch of 
an L junction is laheled as a ehpdcw edpe, then tr e other 
hranch must be laheled as e shadow edre as we]l. 

I-crecver, shadows are directional, i.e. ir order to 
sreciiy a shrdow ede-e> it must rot enly he labeled "shadow" 
hut rust fIso te narked tc indicate which side of the edye is 
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shadowed and which side is illuminated* Therefore, rot enly 
the type of edge hut the nature of the regions en each ride 
cen he cors trainee!* 

These faots can he illustrated in a jigsaw purzle 
are lory, shewr in figure 2.2. Given the five different edge 
types I have discussed so far, there pre sever different ways 
tc Is?! el any line segment. This implies that if all Dine 
Irhels could le assigned independently there would he 7 = 49 
different ways to label an L, 7 = 34? ways tc label a three- 
line lunctior, etc* In fact there are cnly 9 ways in which 
real sc ne features can map into Is or a retiral projection* 
Teble 2*1 summarizes the ways in which junctions car be 
assign ed label . ings from this set* In figure 2*3, I show all 
the possible lalelings for each junction tyre, limiting 
myself to vertices which are formed by re more than three 
planes (trihedral vertices) and tc junctions of five or fewer 
lines* In Chapter 3 I explain how to obtain the junctions in 
figure 2*3; I do not expect that it should be obvious to you 
how one could obtain these junctions* In general, for 
clarity, I have tried to use the word labeling to refer to 
the simultaneous assignment of a number of line labels* 
Labels thus refer to line interpretations, and label ings 
refer to junction or scene interpretations. 
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2.2 SCIVIIG TFF L/FEL ASSICK7F7 1?C?JTV. 

labels can \e assirred to each line sepnert by a tree 
rcrrci procedure. Ir terms of the jiyeaw ruzzJe ? ralcay, 
i; rrire that we have the folic vinf items: 



1 . A tcard with chanrela cut to represent the Jine 
cr a wir a; the board rpace car accent onlv L rleces ft each 
rjrce where the line drawing has an L," onlv fBFCY nieces 
wl ere the lire drewirp has en /RRCV, etc. "next to" each 
-urcticn are three bins, marled "iurcticr number", "untried 
If: heir", rrd "tried labels". 

A full set cf pieces for every rpace on the board. 
If trc lire drawirp represented by the board las five Ls then 
there rre five full sets cf L pieces with nine pieces in each 
act. 

;. A ret cf iurcticn rumler taps marled J1 , J2, <T% 
f\ " "> <-■*, where n is the number cf junctions or the borrd. 

'» A counter which can be set to arv number between 1 
arc r. 

The tree search procedure can then be visualized as 
fellows: 

Step 1: Fame each iurcticn by placinp a ;'urction numler tar 
ir each bin marked "junction rumber". 

Step 2: Place a full set of the appropriate tyre of pieces 
ir the "untried labels" bin of each 'junction. 

Step Jz Set the counter to 1. From here or in Kc will be 
used to refer to the current value of the counter. Thus if 
the counter is set to 6, then J(Nc) = 6. 

Step l> iry to place the top. piece from the "untried labels" 
b:r cf junction J(ilc) in beard space J(T;c). There are 
several pcssihle cutccmes: 
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<A« If the piece c?-r be placed (i.e. tie piece matches 
all adjacent rieces already placed, if air-), then 

A1.. If Nc < n, increase the counter by cne and 

repeat Step 4* 

/2. If Kc = n, then the rieces row en the beard 
represent one possible labeling for x the line drawing. If 
this is true then 

i. V/rite dewr or otherwise remember the 
labeling, and 

ii. Transfer the piece in sra.ee n bach into the 
n-th Tf untried labels" bin, and 

: ii. Go to Step 5 # 

4E. If the riece cannot be placed, put it in the "tried 

labels" bin ard repeat Step 4. 

AC. If there are no mere pieces in the "untried labels" 

bin, then 

C2. If Nc = 1 , we have found all (if any) possible 
labelings, and the procedure is DCNE. 

C2. Otherwise, go tc Step 5* 

Step ^ : lo all the following steps: 

i. Transfer all the pieces from the Nc-th 
"tried labels" bin into the Nc-th "untried labels" bin, and 

ii. Transfer the piece in space Nc-1 into its 
"tried labels" bin, and 

iii* Set the counter to Nc-1, ard go to Step 4* 

To see hew this procedure works in practice, see figure 
2.4- Fcr this example assure that the pieces are piled so 
that the order in which they are tried is the same as the 
order in which the pieces pre listed in figure 2.". The 
example Is. carried out only as far as the first labeling 
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ottair.ed by the procedure* There is, of course, at least ore 
ether labeling, namely the one we could aesi^r by inspection. 
The fI false 11 labeling fourd first could be eliminated in this 
esse ly a program if it krew that P3 is kriyhter thar E1 or 
that R2 is brighter than R1 . It could ther use heuristics 
v;hich only allow it to, fit a shadow ecre in ore criertation, 
river, the relative illumination on both sides of a line. 
Kcwevcr, if the object happened to have a darker surf roe than 
the table, this heuristic would not help. 

Clearly this procedure leaves nany unsolved problems. 
Ir general there will be a number of possible labelirfs from 
which a program must still, choose one. What rules car it use 
tc make the choice? Pven after choosing a labeling* in order 
to answer questions (about the number of objects in the 
scene, about which, eC^es are shadows, about whether or not 
any objects support other objects, etc.) a program must use 
rules of some sort to deduce the answers from the information 
it has. 

I will argue that what is needed to find a single 

** reasonable interpretation of a line drawing is not a more 

clever set of rules or theorems to relate various features of 

the line drawing, but merely a better description of the 

scare features • In fact, it turns out that we car use a 
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prrsirf procedure which ir.vclvor leer ccrr.vtrtz.cr thc?r the 
tiec search procedure. 

r.7' ffttf:. fixf BrscsiPTici: 

> c c frr J have classified ceres only or the basis of 
gecr-ctry (corcrve, convey, cbscurirg cr rlanar) fv.6 have 
sibdivid.ee the planar class into crach arc shadow rub- 
cZrrrcr* Cuppcse that I further break down each class 

acccrding tc vhether cr ret each edre err re the "founding 
ex<-z of dv. cbject* Objects cm he bounded by chscuring 
ee'res, ccrcavc edges, and crack ei-pes. Figure 2*5 shows the 
results cf rrperding a label rnalcgcus to the "obscuring 
ec>e fI rarh tc err ck and concave edres,. This rrprcach is 
szj . ilrr tc cne first proposed by Freud er (Freud er 1971a) • 

lach region can also be labeled as belengirg tc one of 
the three followirg classes: 

I - Illuminated directly by the light source* 

£P - A projected shadow region; such a region would be 
illurinated if ro object vere between it and the light 
source* 
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£S - A sclf-rhadcwed region; such a region is crier ted 
away from the ligl t source 

Civer these classes, I can defire new ecf'e labels which 
also include inf creation about the liphtirg or both sides of 
tie edfe. Notice that ir this wry I can include st the edge 
level, a very locel level, irforraticn which constrains all 
edger bour.dirg the same two regions. Put another way, 
wierever c lire cm be assigned a single Ipbel. which includes 
this lighting irforraticn, ther a prcgrar has powerful 
constraints for the functions which can appesr around either 
of the regions which bound this line. 

^ figure 2.6 is race up of tables which relate the region 

illuu.i nation types which can occur on both sides of esch edye 
type. For example, if either side of a concave or crack edpe 
is illuminated, bcth sides of the edge must be illumirated. 

These tables car be used to expand the set of allowrble 
junction labels; the new set of labels cm have a nurber of 
entries which have the same edge geometries but which have 
different region illumination values. It is very easy to 
write a program to expand the set of labelings; the 
principles of its operation are (1) each region in a given 
junction labeling can have only ore illumination value of the 
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three, and (2) the values on either side of erch lire of the 
junction must satisfy the restrictions in the tables of 
figure 2.C. 

/n interesting result of this further subdivision of the 
line labels is that, with four exceptions, erch shac ov- 
er usirr junction has only one possible illumination parsing, 
bf shewn in figure 2.7. Thus whenever a scene has shadows 
arc vh never a program can find a shadow causing junction in 
such a scene, it can greatly constrain all the lires and 
regions which male ur this junction, Tn figure 2+1 I have 
also : arked each shadow edge which is part of a shadow- 
causing junction with, an "L" if the arrow on the shadow edge 
points counter-clockwise and an tT R r! if the arrow points 
clockwise. ho "I" shadow edae can match an fT R fl shadow ecge, 
corresponding to the physical fact that it is impossible for 
a shadow edge to be caused from both of its ends. 

There are two extreme possibilities that this 
partitioning may have on the rumber of junction labelings now 
needed to describe all real vertices: 

(1) lach old junction label which has n concave edges, m 
crack edges, p clockwise shadow edges, q counterclockwise 
shadow edges, s obscuring edges and t convex edges will have 
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tc be replaced by (20) (6) (3) (?) (9) S (8) nei: junctions, or 

(2) Each old junction will give rise to crly ere new 
1unctj.cn (as in the shadow-causing junction cases). 

If (1) were true then the partition would be worthless, 
since no new information ceulo\ be faired* If (2) were true, 
the situation would be greatly improved, since in a sense all 
the much it ore precise information was implicitly included in 
the cr" final junctions but was not explicitly stated* 
Because the information is now r ore explicitly stated, many 
matches' between junctions can be precluded; for example, if 
ir the old scheme sore, line segment 11 of junction label 01 
ccule have been labeled concave, as could line segment L2 of 
junction label 02,. a line joining these two junctiors could 
have been labeled concave* lut in the nev; schere, jf each 
junction label gives rise to a single new label, both L1 and 
L2 would take on one of the twenty possible values for a 
concave edge. Unless both 11 and L2 gave rise to the same 
new label, the line segment could not be labeled concave 
using 01 and Q2* The truth lies somewhere between the two 
extremes, but the fact that it is not at the extreme of (1) 
means that there is a net improvement. In Table 2.2 I 
cor pare the situation no\v T to cases (1) and (2) above pnd also 
te the situation depicted in Table 2.1. 
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I have also vsed the tetter descriptions tc express the 
restriction that each scene is assumed, tc be on e horizontal 
trfcle which he s nc holies :n :*t and which is large enough tc 
fill the retina* This rears that any line segment which 
separates the baclg round (table) from the rest of the scene 
Con only be labeled as shewn in figure 2-£* Eecsuse of this 
fret the number cf junction labels which could be used to 
Irbel junctions on the scene/background boundary can be 
greatly restricted • 

The value cf a better description should be immediately 
apparent. In the old classification scheme three out of the 
seven line labels could arpear on the scene/background 
boundary, whereas in the new classification, only seven out 
cf fifty labels can occur* Moreover, since each junction 
must have two of its line segments bounding s,ny region, the 
fraction of junctions which can be on the scene/background 
boundary has improved roughly from (J/7) (3/7) = S/49 = 1£.4% 
to (7/57) (7/57) = 49/314S = 1.6%. The results of these 
improvements will become obvious in the next section. 
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2. A Fr.CGR/-MKIlG CCKSECUENCFS 

'Ihere are ro r.rny rcsricle Irfcelr for erch type of 
-■vrciicr. that I dec: dec 1 to te-p-jn rropraEEinp p ]a"be]ir>y 
systei by writing a sort cf filtering program to eliminate as 
many jurctior labels as possible before beginning a tree 
search procedure. 

r - he filter procedure depends on the following; 

observation, fiver in terms of the jigsaw puzzle analogy: 

fuppcse that we have two junctions, J1 and J? which are 
joined ty a. line segment Ir-J1~J2. J1 and J2 are 
represented by adjacent spaces on the "board and the 
rcssible labels for each junction by two stacks of 
rirces. Nov: for any piece V. in J1 's stack either (1) 
there is a latching piece II in J2's stack or (2) there 
is no such piece- If there is no matching piece for -K 
then M can l.e thrown av/ay and. need never be considered 
again as a possible junction label. 

•The filter procedure below is a method for 

systematically eliminating a31 junction labels for which 
there can never be a match* /ll the equipment is the same as 
that used in the tree search example, except that this time I 
have added a card marked "jure tier, modified" on one side and 

tr ro iu.ncticn modified. 1 ' on the other. 
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iter) 1: Put a jircticn nurT er tcr letveen 1 arc n in 
each "jure tier nur.ter" bin. Piece r fun set. cl" rieces 
in the "untried jabejs" lin cf erch ;'urcticn. 

; tep 2: Set the ccurter to i'c = 1, end rlace tie rare" 
eo tlat :t roads "no jure tier modified"". 

htep J: Check tie velue cf he: 

A. If he = n + 1, and the card reeds "re -unction 
1. od if ice" then gc to SUCCFIh. 

F. If Nc = n +1, end the cerd read? " -'unci i en 
rcdiiied" then ec tc Step 2. (At least ore niece was 
throvn avsy on the lart ress, arc therefore it ir, 
; ccsitle the.t other rieces which were kept cr3v because 
this riece v.as presert will row have to be throvn ewa^ 
else.) 

C. Otherwise, ro te Step 4. 

etep 4: Check the "ur tried labels" lin of "unction 

d(Nc): 

/ p N A. If there ere no pieces left in the Nc-th 

1 "untried labels" bin, then 

A1. If there are no pieces ir the Nc-th 
"tried labels" bin, ro tc FAILURI. 

A2. Ctherwise, transfer the rieces fr-om the 
hc-th "tried labels" bin back into the Nc-th "untried 
labels" tin, add 1 tc the counter (he) and gc to Ster 
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. ^ , E - , If there are pieces left in the Hc-th "untried 
labels" bin, take the tcp riece frcm the bin prd rdace 
It ir the board, and go to Step ^. 

Step 5: Check the spaces ad racer t tc space he: 

A. If the piece in the Nc-th space has ratching 
pieces in each neighboring junction srace, trenrfer t>e 
riece frer srace he into the Nc-th "tried labels" lin, 
and transfer the pieces fror the neighboring sraces and 
the neighboring "tried labels" bins back intc their 
"untried labels" bins. 

P. If there are empty neighboring spaces, then 
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E1. If there are ne more junctions in the 

neighboring "untried labels" bine which/ could fit with 
the piece in space Mc, tr en that piece ir not a rossible 
.label. Threw it away, and arrange the card to read 
"jurcticr modified" if it doesn't already. 

E2. Try pieces from the neighboring "untried 
labels" piles urtil cither a piece fits or the pile is 
exhausted, ^rd then fo te Step^ 5 again. 

cUCCrFT: The pieces in the "untried labels" hins of 
each junction have parsed the filtering routine and 
constitute the output, of this procedure. 

I'AILl'EE: There is no way to label the scene riven the 
current set of pieces. 

In the rrcgram I vrcte, T used a somewhat rcre complex 
variation of this procedure which only reouires ore pass 
through the junctions. This procedure is similar to the one 
ured to generate figure 2.9, rnd is described, below. 



\ hen 1 ran the filter program on some simple Dine 
drawings, I found to my amazement that the filter procedure 
yielded urique labels for each junction in most cases! In 
fact in every case I have tried, the results of this 
filtering program are the same results which would be 
obtained by running a tree search procedure, saving all the 
labelings produced* and combining all the resulting 
possibilities for each junction. In other words, the filter 
program in general eliminates all labels except those which 
are part of seme tree search 3a.be! ing for the entire scene. 
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"FIGURE 2.9 
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It ir not obvious that this rr.hcu.lcl he the case. For 
example, if this filter procedure ir applied to the sir pie 
lire drawing shown in figure '.A- using the eld set of la T: els 
river in figure £.;;, it produces the results shovr ir firure 
2.9. -^ this figure, each 1urcti.cn has labels attached which 
would not he 7 art of any total labeling produced Y.y a tree 
search. This firure is oltaired hy going through the 
iuroticrs in numerical order sne 1 : 

(1) /ttaohirg tc a function all labels which do not 
conflict with iurctiens previously assigned: i.e # if it is 
known that, a 1 ranch rust he labeled from the set S, do not 
attach- ary function labels which would require that the 
branch he labeled with an element not in £. 

(2) Locking at the neighbors of this jure tier which have 
already beer labeled; if any label does not have a 
corresponding assignment for the same branch, then eliminate 
it* 

(3) Wherever any label is deleted from, a junction, look 

at. all its neighbors in turn, and see if any of their labels 

can he eliminated. If they can, continue this process 

iterativelv urtil no more charges can be made. Then go or to 
*j «... <■.. 

the next junction (numerically). The junction v/hich was 
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being lateled (fs ir step (1)}at the tirre r Irbel wps 
elimirated (struck cut in the figure) is noted next to eac 1 ^ 
eii; iretee lalel ir figure. 2. c . 

'I he fact tr?t these results een te rrcducod "by the 
filtering rrofram says f great deal at cut lire drawings 
rrrerrted by real sceres end slsc about the value of precipe 
descriptions. There is sufficient local information ir a 
lire drawing so that a rrorrarr. can use a procedure which 
recuires fer less computet: en thar dees a tree search 
procedure. 7c see why this is sc, rotice that if the 
description the rrogram user is geed enough, then rany 
iurciiers r.ust alweys te river the srire unique label ir. each 
tree search solution; the filtering proerem needs to find 
such ;.- lalel crly once, while a tree search procedure ir.ust go 
through the process of firdinr the sare solution en er-ch rass 
through the tree. 

Cuite remarkably, all these results are obtained usin^ 
orly the topology of line drawings plus krowledge about which 
regicr is the table and about the relative brightness of each 
region- No use is made (yet) of the direction cf line 
segmerts (except that some directional information is used to 
classify the junctions as AFECV's, FCEKs, etc.), nor is any 
use rede cf the length of line segments, microstructure of 
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edges, lighting direction or ether potentially useful cues. 

2.? F/.NDLTNG IAD TATA 

ho far I have treated this subject as though the program 
vculd always he river perfect data.. In fact there are rany 
tyres of errcrs and degeneracies which occur frecuently. 
Sc:.:e of these can be corrected through use of tetter Dine 
finding programs prd seme can he eliminated by using stereo 
irfcri.aticr, hut I would lihe to shew that the program can 
handle various prchlers by sir pie extensions of the list of 
junction labels- Ir no case do I expect the program to be 
able to sort cut scenes that people cannot easily understand. 

j'v/c of the most common types of bad data are (1) edges 
missed entirely due to equal region brightness on both sides 
of the edge, and (2) accidental alignment cf vertices and 

lines. Figure 2.10 shows a scene containing instances of 
each type cf jroblem. 

The program handles these problem junctions by 
federating labels for them, just as it does for normal 
junctions. It is important to be able to do this, since it 
is in general very difficult to identify the particular 

junction which causes the pre gram to fail to find a parsing 
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of the scene. Ever worse, the program ray fine p. way of 
irterrreting the scene as though the rata were perfect and it 
wcule ther not even pet ar indication that it should jook for 
other interpretations. 

2.6 ACCIDI'KTAI ALIC-KMIvT 

Chapter 7 treats f nrirber of different tyres of 
occidental aJignrent. Figure 2.11 shows t?-ree of the irost 
con or tyres which are included ir the prcgrar 's repertoire; 
ccrsider three kirds of accidental alignment: 

(1) cases where a vertex apparently has ar: extra line 
tecouee ar edfe otscured hy tie vertex appears to he part of 
the vertex (see figure 2,1 1a), 

(2) cases where an edge which hs between, the eye ard a, 
vertex appears to intersect the vertex (see figure 2.11h), 
arc 

(3) cases where a shsdow is prciected so that it 
actually does intersect a vertex (see figure 2.11c). 

2.7 K 1SSIIC LIKES 
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I have rot attempted 1c ryz tenatically include all 
nissirg line rossitilities, bit have enly included lalels for 
the i ost cenron types of ri^inr lines, j require Hat my 
rfssiry i ire re ir the inter icr of the scene; no line on the 
scere/bachrround tcundary can he r. issin? . I rlso sssur.e that 
ril ctiects have approximately the sere reflectivity en all 
surfrces. Therefore, if a convex line is missirr, I assure 
tlrt cither hcth rider of the ed r e were illurinated rr that 
tcth vere shadowed. 1 have rot really treated missir^ lines 
ir a complete enouyh ray to say much stout them. There will 
have to le facilities ir the prewar fcr fillirr ir hidden 
svrfrces end lack faces of ot yjects before missinf lines can 
be treated satisfactorily. 
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In general the urogram will repcrt that it is urable to 
Irtel a scene if r.cre than, a few lines are rissinr md the 
irissirr line labels are not included in the set of possible 
junction labels. This is really a si^n cf the pewer of th 
prcfram, since if the appropriate labels for the rissinp- line 
junctions were included, the prorram would find them 
uniquely. As an example, the simple scene in figure ?.12 
cannot be labeled at all unless the irissin^ line junctions 
are ir eluded. 
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2.h RICICI. cr:fi:t/tic:t: 



I" crime err he zsrirrsc Irhelr vhjch- -ive nir,rt:zed 
vrlue; for rer icr. oricr.trtiorr ir three dirrrricrr. Chore 
irhelr err fcr s.dced to the ;urcticr. lehclr : : n very n;ch the 
r;:r ' ry thst the rerier. jllur irri icr vclves vere r.tVfc 5 . 
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Clearly there are considerable obstacles tc be dv^iccme 
ir extending this wcrl- tc general scenes. For s im ml e curved 
oljects such as cylinders, spheres, ceres, and ccnic 
sections, there should be no particular problem ir urin^ the 
type cf program I have vriften. (Per a ruite different 
approach to the handling cf curved objects, see Hern 1970.) I 
also Telieve 1hat it vill be rcssible to handle somewhat more 
fcrerrl scenes (for instance scenes containing furniture, 
tccls and household articles) by approximating the objects in 
them ly simplified "envelopes 11 which preserve the press form 
cf the objects yet. which can 1e described in terms lihe those 
1 have used* In r.y estimation such procersinr cannot be done 
successfully until the problem cf reconstructing the 
invisible portions of the scene is solved* This problem is 
intimately corrected with the problem, of using the stored 
description of an object to guide the search for instances of 
this object, or similar objects in a scene. The ability t 
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•'^Krlabel a line dfa¥iJig* "~ - in the manner I descril3#; greatly 
simplifies the specification and hopefully will simplify the 
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ABSTRACT 



This rarer discusses some recent changer and additions to the 
vision system. Among the additions are the ability to use visupl 
ijedtack when trying to accurately position an object srd the 
ability to use the arm as a sensory device. Also discussed are 
some ideas and a description of prelimirarv work on a particular 
sort of higher level three-diir.ersior.al reasoning. ^ r "cu_ar 



This paper vas originally published as Vision Flash 26. 
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JIGGLING A BLOCK INTO PLACE 

The vision system can now use visual feedback when trying to 
accurately position a block* This is done without a costly 
rescanning of a significant portion of the scene by using our 
knowledge of where the block should be to direct the eye* The 
basic idea is to determine the block's actual location by looking 
for certain key vertices using a circular-scanning vertex finder 
developed by Winston and Lerman < Vision Flash 24 >* 

When placing a block the arm sometimes makes positional 
errors up to half an inch and rotational errors of about 10 
degrees* These errors are caused by poor hand placement due to 
hysteresis and general slop in the arm's joints and by poor 
information about the brick's initial position and dimensions due 
to a distorted line drawing* Although these errors can be 
disastrous in delicate tasks such as stack-building, they are 
small enough to allow us to use the scheme described below. 

The organization of the theorems is shown in figure 1. TC- 
JIGGLE, the top level theorem, first calls TC-FIND-BODY whose 
goal is finding the actual location of the just moved brick* 
This is done by locating a three- vertex 'skeleton' on either the 
top or bottom of the brick, examples of which are shown in figure 
2* Candidate skeletons are suggested by the theorems TC-LOCKFOR- 
TOP, TC-L00KF0R-B0TT0M, and TC-LOOKFOR-SKELETON which predict the 
locations of vertices and decide whether they should be visable* 
TC-FINDEODY then locates the three vertices comprising the 
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skeleton with the circular-scar vertex finder and calculates the 
true position of the brick. If it fails to find one of the 
vertices, it asks for another skeleton and tries again. 

Once the location of the brick is found, TC-SHIFT-BOI)Y 
calculates the positional and rotational errors and, if they are 
greater than a tolerance, corrects them through a call to TC- 
MOVE-GENTLY. This theorem differs from the usual TC-MOVE in 
calling the arm with GRASP and UNGRASP commands instead of PICKUP 
and DROP. PICKUP and DROP raise the arm several feet above the 
table when moving to avoid obstacles, whereas GRASP and UNGRASP 
lift the hand less than an inch (using the wrist) and thus, 
hopefully, are less prone to error. 

The most difficult part of this jiggling procedure is 
determining which vertices of a brick will be visable and not 
obscured by other objects. We must also avoid locking for 
vertices which are adjacent to others already in the scene, for 
example the vertices where two bricks are aligned. Such 
situations may confuse the vertex finder and cause it tc find the 
wrong vertex. Since these theorems are written to work in the 
context of a copying task, they use information about the model 
scene that is being copied* For instance, before TC-LCOKECR-TCP 
looks for any vertex on the top of a brick it must either find 
that: 

1 . The top of the matching brick in the model was 
completely visable, or 
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2. All bricks which could be adjacent to the cne in 

question are either below it or have not yet beer placed. 

The theorems lean toward conservatism in accepting vertices as 

good candidates to look for and will reject all of them in some 

cases . 

One exciting possibility for further work is the 
incorporation of a model of the hand. With it we could adapt the 
system to avoid vertices occluded by it, doing away with the 
necessity to release the brick and withdraw the hand. This would 
result in a more dynamic and accurate feedback system. 
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OUR ROBOT HAS A HAND, TOO 

Until now the vision system has made no use cf its arm in 
getting information about the world* We now have a limited 
ability to reach out and touch in order to disambiguate some 
scenes, using a new arm primitive written by Jerry Lennan. 

Sending (TOUCH X Y) to the arm causes it to position itself 
above the point (X,Y)* slowly descend until it touches something, 
and report its final height- An optional third argument can 
specify a maximum height at which something is expected, allowing 
the arm to rapidly drop to this height and then more slowly feel 
its way downward . 

A series of theorems have been written which actively use 
the arm as a sensor and other theorems have been taught to use 
them, resulting in the system network shown in figure 3* With 
these theorems we can now handle scenes such as the pedestal in 
figure 4* In this scene we can't determine the tallness of B1, 
since it could touch the bottom of B2 near the front, the back, 
or somewhere in between. As a result, we can't get the 
dimensions and location of E2 either. 

We can however determine the location of B1 in the X-Y plane 
(through TC-FIHD-LOCATION-BOITOM). Moving the arm down over this 
spot until it touches the top of B2 gives the altitude of B2's 
top* With this information we can calculate the location and 
dimensions of both bricks. 

Previously, when we wanted to -find the location or 
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dimensions of a brick we had to find its altitude above the 
table. If it was not resting en the table, we had to find the 
dimensions of its supports, necessitating knowing their altitudes 
above the table, etc.*.... We recurse downward until we reach the 
. table or fail by hitting a brick for which no tallness or 
altitude can be found. With these new theorems we have another 
alternative: recursing upward until we find a brick we can touch 
with the arm. 

One problem is that we aren't working with a very good three 
dimensional model. TC-TOUCHTCP is the theorem which tries to 
touch the top of a brick. Checking first that there is nothing 
above the brick, it tries to touch it above the center of one of 
' l its supports. The brick could, however, not be above this spot 
(as in figure 4b) causing the arm to miss it. One precausicn 
that TC-TOUCHTOF takes is calculating the minimum height to 
expect the top of the brick. If it touches something below this 
height, it assumes it missed. 
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TC-FIND-SUEPORTS 
TC-FIND-SUIPORTS and its related theorems have been modified 
to handle situations with which they previously could not cope* 
Figure 5 show3 the new organization of this part of the system. 
The strategy of TC-FIND-SUPPORTS was to take each object below 
the brick in question (found through TC-FIND-ABOVE-1 * -2) as a 
support candidate* The altitude and tallness of each candidate 
were found and summed* The object or objects (if there were 
several with nearly equal combined altitude and tallness) with 
the largest sum were then taken as the actual supports* This sum 
was then asserted as the altitude of the supported brick* The 
theorem failed if it could not find an altitude or a tallness for 
one of the bricks below* 

The new TC-FIND-SUFPORTS works in much the same way, but has 
been modified to handle many cases where the tallness or altitude 
of an support candidate can not be found. In such cases it 
determines the minimum height that the top of the candidate could 
have. 

It will also yield useful information in cases where it is 
still ambiguous which objects support another* Before failing it 
makes assertions of the form: 

(E1 may-be-supported-by B4) 

(B1 may-be-suppor ted-by B7) 

(E1 has-minimum-altitude 4*12) 
These assertions can later be used by other theorems with more 
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real world knowledge to clarify the scene. For example, we might 
call on a theorem which knows about stability or one which can 
recognize a table top and legs to decide who is doing the 
supporting. 

Two auxilliary theorems are used, TC-AHJ-TO-SUPFORTS-I and - 
2, which contain some 3-D knowledge. TC-ArD-TC-SuTFORTS-1 looks 
for a marrys relation between the brick in question and a support 
candidate. If one is found, the theorem reports that it must be 
a support (assuming gravity and no glue). TC-ADD-TO-SUPPCRTS-2 
is explained below. 

The capabilities of the new TC-FIND-SUPPORTS are best shown 
in the scenes ir figure 5 . For each of these scenes the old TC- 
FIND-SUFPORTS would simply fail, leaving no assertions in the 
data base. Figure 5e is particularly interesting , showing the 
application of some three dimensional reasoning. On this figure 
TC-FIND-SUPPORTS first calls TC-FIND-SUPPORT-CANFIDATES which 
reports that B2 and B3 are likely support candidates and that E1 
must have an altitude of at least T. TC-ADD-TO-SUPPORTS-1 then 
finds that B2 marrys B1 along B1 's bottom edge, implying that E2 
must support B1 and that B1 has an altitude of T. TC-ArD-TO- 
SUPFORTS-2 is activated and notes that E1's altitude is now known 
to be T. Discovering that the minimum tallness of B3 is also T 
(within an epsilon) it asserts that B3 must also marry B1 and be 
a support. 
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Assertions made by 
TC- FIND-SUPPORTS: 

(Bl is-supported-by B2) 

(Bl is-supported-by B4) 

(Bl is-supported-by B5) 

(Bl is-supported-by B3) 

(Bl has-altitude t) 
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is-supported-by B2) 
has-minimum-altitude t) 




(Bl is-supported-by table) 
(Bl has-minimum-altitude t) 
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(Bl may-be-supported-by B2) 
(Bl may-be-supported-by B3) 
(Bl may-be-supported-by B4) 
has-minimum-altitude t) 




(Bl is-supported=by B2) 
(Bl is-supported-by B3) 
(Bl has altitude t) 
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CHANGES TO TC-SKELETON 
TC-SKELETON has "been changed to return a little more 
information if it can't find a complete skeleton for a "brick. 
When TC-SKELETON fails to find a line of a particular type it 
tries to find the longest line fragment of that type and makes 
partial-skeleton assertions of the form: 
(B3 type-two * (V1 V2)) 

(B3 has-partial-skeleton V4 V1 * (V1 V2) V1 V14) 
Figure 6 shows some examples of partial skeletons. 

This partial skeleton information is used by other theorems 
which hypothesize what the rest of the brick may be like. From 
it we can get a handle on some three-dimensional information such 
as a brick's orientation, its minimum dimensions, and some idea 
of its location* Since other theorems make hypotheses that \> 

complete parts of the skeleton, an antecedent theorem has been 

M 

added to keep the skeleton assertions up to date. w 

FIGURE 6 



(B4 has -partial -skeleton * (VI V2)*(vi V3) V2 V4 
(Bl has -partial -skeleton VI V2 VI V3 * (V3 V4)) 
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'tc-line-belcngs-to-regicn 

A new theorem exists which will determine whether a line is 
physically associated with a particular region in a picture. 
Jhis question crops up in numerous places in the vision system: 
. finding the skeleton of a brick, finding a brick's tallness, 
finding bottom lines, deciding whether a face of a brick is 
vertical or horizontal, etc* In each of these places several 
heuristics were employed to find lines which 'belonged* to a 
region. TC-LINE-BEL0NGS-TO-REGI0N is a collection of these 
heuristic and several- new ones which can be called whenever 
needed. It makes assertions of the form. 

(L-VVV2 belongs-to R4) 
One of the heuristics is that an interior line of a body 
belongs to both regions it bounds, so that TC-LINE-BELONGS-TO- 
REGION should not be used on self ^-occluding bodies. In general, 
the heuristics are applied conservatively, sometimes calling on 
the theorem TC-0CCLUSI0N-1 to look for supportive or 
contradictory evidence. Consequently some lines will not be 
recognized. Having this information at hand makes many tasks 
much easier. For example: 

* A region is a vertical face if there is a vertical 
line which belongs to it. 

* Two objects marry if we find a line which belongs to 
a region from each body. 
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How many things can you find wrong with this picture? 



In a picture such as the one above where one trick obscures 
the bottom of another, we can extract some information on the 
whereabouts and size of the obscured brick. Examine the scene in 
figure 7. B2 could be a very tall brick which was touching B1, 
or a shorter brick far behind B1, or anywhere in between. It's 
ambiguous. Knowing the range of possible heights and resulting 
locations of the obscured brick will be quite useful to other 
thoerems which try to decide what the situation really is. To 
get quantitative three dimensional information we use the 
procedures described below. 

To get the maximum tallness ofB2 we assume that it is 
directly behind and touching B1. Assume for the moment that the 
ends of B2's three vertical edges (b, c & d) touch the edge a-e. 
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These three points would then have an altitude of h (the 3-D 
tallness of B1) from the table. From this we can get their 3-D 
coordinates and the resulting lengths of the three verticals. We 
then take the shortest of these lengths as the maximum tallness 
of E2. This corresponds to selecting the vertical ending in c as 
the only one which could touch E1. 

To get the minimum tallness we assume that B2 is far behind 
B1 and its bottom edges just "barely obscured. For the moment 
assume all three points are on the table. We get their 3-D 
tallness coordinates and calculate the lengths of the verticals. 
The maximum of these three lengths gives us a minimum tallness 
for B2. Taking the longest corresponds to selecting the point b 
f as the only one which could actually be on the table. 

The problem with the picture on the previous page is that, 
assuming the ocscured object is a brick, its apparent minimum 
tallness is greater than its apparent maximum tallness. 

A further refinement of this heuristic is shown in figure 
8. In this picture, to get the minimum tallness of the obscured 
brick we assume that the points a and b rest not on the table, 
but on the regions R1 and R2 respectively. We must check of 
course, that these regions are not vertical, as in figure 8fc. 

The same situation exists for the inverted case, the 

pedestal in figure 9* If we assume that B1 supports B2, we can 

put upper and lower bounds on the height of B1 (and the resulting 

_ altitude of B2). We know that the top of B1 must touch the 
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bottom of B2 somewhere near the front, the back, or in between. 
Getting the minimum tallness is trivial, just measure the length 
of the three vertical edges of B1 and take the largest of these. 
This corresponds to a situation where B1 and B2 many along their 
front edges. To get the maximum tallness we start off by 
locating the visible bottom edges of B2 and predicting where the 
others should be. We then extend each vertical until it 
intersects one of the back bottom edges of E2. After calculating 
the 3-D lengths of these verticals we take the shortest as being 
the upper bound on the height of B1. 

Considering the stability of such structures would of 
course lead to more refined upper and lower bounds. 
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ONE KIND OP THREE-DIMENSIONAL REASONING 

Everyone agrees that future advances in computer vision will 
come with the incorporation of 3D knowledge and reasoning. One 
type of 3D reasoning we could add is shown in figure 10. In 
figure 10a the average human viewer (trusting and non-cynical) 
sees two identical bricks supporting a third , even though E1 
could be longer or shorter than B2. Because B1 and B2 serve the 
same function (supporting B3) , have the same orientation, and as 
far as we can see, could be the same size, we easily assume that 
they are. Similarly , in figure 10b , we see B4 as being the 
same size as the other three standing bricks. Here the evidence 
is: 

* There is a preponderance of tall, thin standing bricks 

* E1, B2, and E3 form a row, which would include B4 

if it were the same size 

* B1, E2, B3, and B4 all have the same orientation 

* B4 could be the same size as B1, B2 and B3 

Humans do this hypthesizing and filling in of details to a 
great degree. It is an integral part of perception, as it would 
be impossibly costly (in time and effort) to try to disambiguate 
everything we see. Teaching the machine to do such hypothesizing 
would be a natural way to incorporate some three dimensional 
reasoning and to enable it to consider the global context of the 
scene. 
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A similar form of reasoning was pointed out "by Winston in 
his thesis < Learning Structural Descriptions from Examples , AI 
TR-231 >. In figure 11, B2 appears to be a wedge, while we see 
£4 as a "brick, even though they show the same arrangement of 
lines and faces. In both cases since the two objects form stacks 
and are exactly aligned we first assume that they are identical. 

We iuight object that this presupposes an orderly scene , one 
which is carefully set up or contrived. However, the world we 
humans create, and which robots may inherit, is just such an 
orderly, contrived world. Even in the mini-world of plain white- 
faced blocks we use in our present research we tend to build 
little arches, stacks, and other structures containing identical, 
interrelated parts. In the larger world of human construction 
this orderliness is more apparent. We tend to build things which 
are symmetric and unsurprising in details. Complex objects such 
as buildings, electronic circuits, and cars are built using 
smaller identical parts (e.g. standard-sized windows, resistors, 
bolts). Who would suppose that the wheels on one side of a car 
are any different than those on the other? Long hallways usually 
have identically dimensioned doors uniformily spaced* The legs 
of a table or desk are nearly always the same. 

We might also object that a robot would do better to spend 
his time trying to disambiguate a scene by removing some 
obscuring obstacles, by walking to one side, or by reaching out 
his hand and touching. In many cases however, our robot may find 
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it impossible to change his viewpoint or to interact with the 
scene. Even more importantly, the ability to do this sort of 
reasoning would allow him to have some expectation as to what 
will most probably he seen if obscuring objects are removed, or 
the viewpoint is changed* The robot can then quickly test his 
previously formed hypothesis. If this verification fails, he can 
flush the hypothesis and examine the scene more carefully. 
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The following pages describe initial work in creating a 
system of theorems to do just this sort of reasoning. A skeletal 
system has been implemented which will handle a number of the 
simpler situations (such as those with fairly obvious group type 
relationships — rows, stacks tables, arches, etc.). 

Whenever we can't find the complete dimensions of a badly 
obscured object, as in figures 10a and 10b, we check for evidence 
that this mystery object might be just like some other object in 
the scene. Typical of the evidence we look for are: 

* Chains of relations (rows, stacks, walls) 

* Identical relations to a third object, or functional 

similarities (e.g. supports of an arch, table legs) 

* A preponderance of identical objects in the scene 

* Exact alignment with another object 

* Other relations and properties such as attitude, 

nearness, placement, marrys, abuts, etc. 



If we find such a candidate we check for contradictory ev- 
idence. Things we check are: 

* That the known dimensions of the obscured object 

agree with the corresponding dimensions in its 
supposed double. 

* The unknown dimension(s) lie within the 
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apparent maximum and minimum bounds 
we have calculated* 

* That the hypothesis yields no colliding objects (i.e. 
those occupying commor space)* 

* That we do not contradict any previous assumptions 

we have made (e*g. support )* 
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In some cases we need not require an object which might be 
an exact double, but only one which implies a similarity that 
might resolve the hidden dimensions* For example, in figure 12 
we hypothesize that that the two stacked bricks have the same 
depth, even though their heights are different* 
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If we find any contradictory evidence, we reject the 
candidate and can lcok for another* If no contradictory evidence 
is found, then we make our calculations and tentatively assert 
the dimensions and location of the object given our hypothesis* 
This is done with pointers to the theorems which suggested this 
hypothesis so that we can reconstruct our reasoning if needed* 

With this information we can proceed to analyze the scene 
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and hope that we have not been tricked. Alternatively ve can try 
to verify the hypothesis by some other means. For instance, ve 
could create a daemon theorem to lurk in the background waiting 
for some of the obscuring objects to be removed. When they are, 
the eye can be asked to verify critical lines or vertices 
suggested by our hypothesis. For our paradigm arch case, as soon 
as the top brick is removed, we can look for the top back 
vertices of the mystery brick. If the vertices aren't found near 
their proposed locations then the hypothesis is flushed. In such 
a case we could use the eye to more completely scan the area to 
resolve the problem. In other situations we might be able to use 
the hand as a sensor to verify the hypothesis or find out what 
has gone wrong. 
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ABSTRACT 



Fifh level serartic knowledge will be eir ployed in the develcpmert 
cf a machine virion program flexible encu£h* to deal with s class 
of "everyday objects" in varied environments. 

This rerort is in the nature of a thesis prcrosel for future 

work . 

This payer was criminally published as Vision Flash 33. 
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Foreword 



This material is rot presented as a discovery but rather es 
a journey or the path laid down by Professors Minsky, Papert ard 
Winston in Vision. Please ignore any dogmatic tone which may 
appear in the efforts to overconie my natural tendency to the 
opposite "I thirk, perhaps, maybe..." school of rhetoric. 



Abstract 

A proposed program of research in Machine Vision is 
described. The first section, the Scerario, lays out succinctly 
and informally the aims of the project and summarizes the 
background of the work, the protlems to be faced and the approach 
to he taken. An extended description of the Problem follows with 
its background and its formulation for the present work. We next 
discuss the Artificial Intelligence Issues involved, ard the 
Approach we will take. Benchmarks are laid down for specific 
progress. Our research interests are once again carefully 
formulated in the Summary. 
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Scenario 

Picture the following scenario. 

We walk into the street, and hijack the first passerby. We 
have him go home, retrieve his toolbox and return to the AI 
Robot. He grabs a handful of tools and dumps them ir front of 
the eye. The Robot identifies these objects. 

The key element here is flexibility in dealing with real 
objects in real situations (surreal perhaps in this instance). 

The flexibility to deal with reality will be the basic aim 
of our research. 

Why has previous work in visual recognition often been sadly 
lacking in this regard? Why does this effort have a better 
chance of success? 

Consider first some recent contributions. 

1* Minsky and Papert — The heterarchical approach essential 
to the flexibility sought (4,5,6,7). 

2. Winston — Network structure, data driven control, basic 
structural description, learning by failure, grouping, 
quantitative analysis of relational concepts (11,12,13). 

3. Shirai — A.I-, and vision research in particular, as an 
experimental science, as demonstrated most recently by Shirai (8). 

4. Waltz — The extent to which completeness of description 
can replace flexibility of control (9). 

5. Charniak — The role and extent of knowledge in 
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understanding (1). 

6. Winograd — Semantic/syntactic integration. Frontal 
attack on global problem; large system structure. 
Visual/linguistic analogies (10). 

7. Hewitt — The power of a problem oriented control 
structure. The ease cf a pattern directed data base. Two way 
flow of control; flexibility of nor destructive failure (3). 

As a framework for these developments, we have the Vision 
System. This has provided the experience and facilities for a 
broad based background in vision research plus the specific 
technical and organizational rsults of the System, itself 
(Complemented by the more formal seminar structure). 
^ Perhaps most importantly we want to deal specifically with 

the problem suggested in the first paragraph. This is very 
different from most previous efforts at visual recognition. 

Often the descriptive problem has been dealt with separately 
as a pattern recognition issue. When real visual input has been 
used, the aim has been generally to transform this input, with a 
single low level predicate, into a "preprocessed" data base. 
Features could then be extracted from this data base and dealt 
with as a pattern recognition or tree search problem. 

In fact, we maintain that there is no real problem, on the 
macroscopic level, in distinguishing a respectable set of real 
objects. In some respects, the more complex and unusual the 
objects the better. It should not be at all difficult to 
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"separate these objects in parameter space" with any rumber cf 
redundantly distinguishing features. 

The problem is to get some of these gross features: 

a. from amid all this redundant mass cf information 

b. ir particular, from the mass of "meaningless" low level 
data 

c. in differing: 

1. individual samples of objects (variations in size, 
shape, orientation, texture, etc.) 

2. scene contexts (occlusions, shadows, etc.). 

On a local level these changing conditions create great 
differences. If we deal only from the "bottom up" we cannot hope 
to deal with such chaotic variation. 

However, with higher level interaction we can hope for 
global guidance at levels that can deal with such variation. The 
old saw: "it's easier to find something if you knew what you're 
looking for". We need to know what we are looking for to find 
the features. 

I think my goals can be approached in two stages. 

1) First I will pick a single object of some semantic richness, 
e.g. a hammer. I will then write the world's greatest hammer 
recognizer. This is not as trivial as it may sound. 

The program will needless to say not merely reply "hammer" 
wher faced with any object, on the grounds that hammer is the 
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only object it knows about. It will have to decide whether the 
object is a hammer cr net a ha rarer. 

It should be able to make this decisior stout your 
roommate's favorite hamirer (or screwdriver). It should be able 
to decide after a strainer is allowed to place the object in the 
field of view ard arrange the lights. It should decide correctly 
wher visiting Cub Scouts come ir and are allowed to empty their 
pockets over and arcund the object to be studied. 

This is a hard problem. 

2) Assuming Part I is net a theris in itself, the next stage will 
be to integrate knowledge of several objects irto a system. 
Hopefully the experience of part I will provide principles for 
encoding visual knowledge that will facilitate adding the initial 
ability to deal with several new objects. 

We will want to design the analytical structure to take 
advantage of the hypothesis and verification techniques developed 
in Fart I. hote that the problem domain of Part I encourages us 
to explore the practical implications of our rhilcsphical bias: 
application of specific higher level knowledge. 

Preceding page blank 
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Problem 

Machine Vision involves both "low level" image processing 
and "high level" descriptive analysis. The failure of Machine 
Vision to date has teen in adequately linking these levels. 

Much of the work in this field has restricted itself rather 
strongly to one or the other of these levels. There are several 
reasons for this. In the beginning there was a tendency for 
image processing people to believe they could "do it all" with a 
clever enough set of Fourier Transforms. The descriptive 
analysts, on the other hand, tended to underestimate the problem 
of preprocessing a suitable data base for their descriptive 
schemes. 

Problems in each of these areas have been formidable enough 
to perhaps demand single-minded concentration. But now that a 
groundwork has been laid we are in a position to take a more 
comprehensive approach. It has in fact become clear that such an 
approach is mandatory for continued progress in Machine Vision 
(4-7,11-13). The pressures for thesis work at a single level of 
the problem have begun to lead away from the overall goal. 
Search for the most sensitive line finder sidesteps the natural 
context in which the problem is to decide which lines to ignore, 
not which obscure lines to find. Yet work continues on idealized 
"picture grammars" which assume the preprocessor has given them 
just the data base they need (without any knowledge of those 
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needs)* 

Part of the reason that work on these two aspects of the 
vision problem has often remained unnaturally isolated must be 
the distinction in interests and skills required for work in the 
two fields • We therefore think it might be advantageous for two 
students to pool their experience in these separate areas* In 
any event a ,! pincer ,! attack is called for specifically on the 
interface between high level and low level analysis • 

This does not mean regarding present accomplishments in 
these areas as "black boxes 11 and wiring them together* Rather, 
processes at each level must be organized with the requisite 
cooperation in mind* 
^s Seeing at any level beyond the feature point is a matter of 

organizing visual information according to some visual model of 
the world. In the past we arbitrarily used only the most 
elemental physical units of the model to process the visual data- 
-the most basic physical regularities and anomalies such as lines 
and points* The rest of the higher level knowledge was stored 
separately to be "matched" against data that could hopefully be 
organized into meaningful structures without knowing which 
structures were meaningful* This knowledge must also be part of 
the visual machinery that organizes the raw visual input and thus 
"sees". Instead this information has been so independent of any 
inherently visual process that researchers have been able to 
point pridefully to their ability to perceive and program this 
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"recognition" apparatus as instances of highly general abstract 
mathematical models. 

In point of fact, higher level context assumptions (a 
parallel i piped environment, for example) have long been implicit, 
or even explicit, in much of our work. Experience has shown the 
utter necessity for such guidance in order to make sense out of a 
chaotic sensory reality* It is time to fully integrate our 
higher level processing into the organization of our visual data. 

We have seen how difficult it is to obtain even a line 
drawing of a cube without suspecting that we are looking at a 
cube. How can we expect to obtain some idealized data 
representation of a horse and then recognize that representation 
as a horse. We cannot choose the ideal descriptive model of a 
horse analytically and expect some syntactic predicate to produce 
this description for us. 

The conclusion we come to is that there is a need to 
eliminate the old distinctions between preprocessing, description 
and recognition. One cannot obtain a preprocessed result which 
is then analyzed to produce a description which is then matched 
with a model to be recognized. Rather seeing is a homogenized 
t | process, the "model" is built into the structure of the 
processor * the description of a data set is the protocol of its 
processing. Recognition is coincident with description. 

Recognition is not a matching of templates against a 
processed data structure. Rather it is a many layered silk 
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screen process that begins with the raw input and leads to the 
richly patterned perceptual world we seek. 
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Issues 

We feel that an approach which is predicated on the intimate 
interaction of descriptive apparatus and description predicates 
is the best hcpe for practical progress in Machine Vision. 
Beyond that we believe that such a project would provide an 
appropriate laboratory for the illumination of several current 
A.I. conceptual concerns. 

1 . Heterarchy 

Problems in reconciling low level results with high level 
idealizations, even in a very simplified domain, provided one of 
the early motivations for the heterarchy concept (4-7,12). 
Return of recognition level advice to the preprocessor is perhaps 
the most salient example of heterarchy practice (4). It is 
significant that, except on a most general level, even this case 
study of heterarchy has not been implemented and its implications 
certainly not fully explored. 

Our investigation of heterarchy in vision would emphasize 
the crucial role of recognition level knowledge. 

We do not analyze heterarchy in terms of interaction or 
advice between major discrete processing modules or stages. We 
do not view "high level" knowledge as criticizing the descriptive 
structure resulting from low level "preprocessing". Rather we 
believe that in general no useful structure can be derived 
without high level knowledge directly involved in the 
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construction process from the beginning. 

In any case, a working fully ramified case study should 
provide useful insight into the theory and practice of 
heterarchical interaction. In particular: 

2. Sensor y/perceptual systems 

The heterarchical interaction between low level sensory 
systems and high level perceptual systems is of particular 
interest to A.I. research at present. Having worked "down" from 
chess-playing and integration, A.I. is now facing the far more 
obscure problem of enabling machines to duplicate the essential 
"automatic" processing of real world sensory data. Techniques 
and processes abstracted from our results in melding low level 
_^ and high level visual processing should prove relevant when we 
provide our robot with other sensory apparatus. The possibiity 
of higher level semantic intervention in the auditory analysis of 
speech, for example, is already a live issue. 

3. Knowledge — "the size of infinity" 

In developing a system that can deal with in a real world 
context we will face certain problems analogous to those faced by 
Charniak in his work on understanding childrens' stories. 
Previous successes in machine vision, or even machine pseudo- 
vision, i.e. analyses of hypothetical input, have dealt largely 
with a severely restricted domain. The recognition set has been 
either highly stylized or simply very small. There is an analogy 
here to syntactic uni-directioral theorem-proving systems, which 
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break down on a non-restricted data base. First we will have to 
get a practical idea of what the size of infinity is in the 
visual domain, and then we will need techniques for organizing 
it. In particular: 

4. Knowledge — as procedures 

Our knowledge — about shapes as well as subjects — will be 
organized as procedures. This is more than a fashionable device 
in a system where knowledge will often consist of knowing the 
right questions to ask of some other module, preprocessing 
modules in particular. ("Preprocessing" is obviously a misnomer 
here, reflecting an organizational mode we hope to replace.) 

Recognition will not consist of preparing a description of 
an input scene and matching it with a description of a model. 
Rather description and recognition will occur simultaneously 
during the processing of the scene. A description will be 
essentially the protocol of a process, not a result. 

5. Parallel processing 

If parallel processing techniques are to prove significant, 
this would certainly seem to be an area in which they might 
demonstrate their value. V/e envision a system in which 
processing would proceed simultaneously on visual subunits or on 
conceptual subprocesses. Proceedings would be dependent 
continually on the ■ results of intermodule communications and 
questions, both as to the results being attained in different 
"higher" level investigations, and the results of additional low 
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level processing* It would seem plausible that dialogues of the 
following style could take place profitably: 

a: I think these four things are legs* Is anybody looking 
at anything above them all? 

b: Yes, I am* 

a: Can you tell me if it's planar or bumpy? 

b: Not yet. 

a: Well, look, work on it; take your time and wake us when 
you have something. 

a: (Yawn) Wazzat? Bumps huh? Probably an animal. We *11 
activate hoof, paw and knee searches. You send your stuff to an 
animal body parser. 
^s c: I can't find any subdivisions on this thing for head or 

tail. Can anybody make head or tail out of any nearby blobs of 
relative size x and position n? 



This particular dialogue may be vacuous. Investigation of many 
possible dialogues may reveal some essential usefulness of this 
type of organization. 

6. Grouping 

As even the brief dialogue above indicated, grouping 
problems will be central. Guzman-like techniques will not 
suffice, particularly for higher level organization. Rather than 
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very sensitive local predicate? capable of distinguishing fine 
variations, we will require broader based cruder predicates 
capable of determining relative homogeneity. Contextual, 
conceptual cues will be needed to distinguish overlapping objects 
froj; functional groups. A program that knows about collars will 
not be dismayed to find a dog's head separated from its body; it 
will be capable of seeking the "severed head". Alternative 
hypotheses will be explored for conjoining parts of the picture, 
ignoring occlusions, or melding sequences into single units. The 
same subscene may be viewed in each of these ways: as linked 
units, overlapping distinct units, single textured units. 
Successful interpretation will direct the choice. 

7. Organization 

Direction will be available from both the lower level and 
the upper level. That is, knowledge about a shape as well as an 
object cr class of objects, will be embedded in procedures which 
will in turn direct the flow of processing through other 
procedures. This flow will not be locked into any piece-by-piece 
or level-by-level decision tree or network search. The system 
can skip to high level hypotheses, plan an approach on that 
basis, fail, review what it found in the process and use this as 
a basis for some more "from the ground up" investigation. 
Details can be verified on a hoof before during or after the 
search for a head; whatever is expedient. 
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Approach 



We expect our progress on this project to reflect the 
structure of the results. That is, "both will proceed by a 
process of "successive approximation". 

Professor Papert has characterized one approach to problem 
solving as "neglect all complication, try something". The 
results may serve as a "plan" on which to base further 
refinements or a basis for "debugging". 

There is a distinction between simplifying the problem and 
simplifying the solution. One technique involves splitting the 
protlem into a number of separate or successive simplified 
^■^ problems. For example: ignore shadows, texture, etc., and just 
consider ideally lighted ideal planes, then consider shadows, 
then consider texture, but no shadows, Etc. This approach can be 
very fruitful. However, it may leave us far from a solution to 
our total problem, particularly if the various aspects of the 
problem are related in a non— linear fashion. In this case it may 
prove necessary at times to approach the total problem with a 
simplified solution. We can then rework this solution 
successively to approach a more adequate solution. The 
intermediate results serve to indicate directions for 
improvement. We may "project" structural aspects of the problem 
from the solution, rather than relying on an a priori division of 
the problem. 
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We expect a continuing dialogue during the research that 
reflects and suggests the dialogue that will be built into the 
sys tem : 

>Is that side straight? 

>I can't tell; it peters out in spots, 

>0h really? Maybe they're lined up wrinkles. Are there 
more of them? 

>Could be. 

>Can you characterize their profile? 

>Yes, I can give you a "possible wrinkle" predicate; but 
that predicate will also work or soft corners. 

>But the surrounding planes for a wrinkle will be similar? 

>Yes, and aligned shadows often occur with wrinkles. 

>Now can you ... 

>No but I can ... 

>Well then if I tell you ..., then could you ... 

The important thing is that many of the technical realities 
of the system will flow from real experience with the problems of 
processing real data. This does not mean that the results will 
lack theoretical content, but that the theory will bear a valid 
relationship to the reality of the vision problem. We want the 
theory to flow from and reflect the structure of visual 
experience, rather than attempt to distort visual data to conform 
to preconceived and inappropriate analytic theories. 
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Eenchmarks 

We might proceed in several phases. A possible sequence 
would "be as follows. 

As a first stage exercise, we could consider the develcpmert 
cf a system that dealt with simple geometric models, e.g. cube, 
cylinder. This system would provide experience ir. melding high 
level knowledge with a suitably varied set cf low level 
predicates. The idea is not to push ary cne predicate or 
approach to its limit but to allow the extent of higher level 
knowledge and variety of low level approaches to work back and 
forth, to zero in on the correct perception. 
/-^ A brief example is a simple cube. We avoid working very- 
hard to find the precise edges from a feature point analysis. 
Rather we obtain some rough regions with a crude homogeneity 
predicate (that averages out not only surface noise but 
textures). We massage these a bit and get a rough idea of their 
shape, if we suspect a parallel ipiped, we may apply a finer test 
to verify this shape. V/e find enough to guess s cube or at least 
a planar object. V/e then apply a crude line verifier over the 
wide band between regions. Meeting success we may home in en the 
lines with a sensitive line verifier in a narrow region. (If 
some regions were lumped into ore at the start we use our models 
to guide predicates in a parsing attempt (4).) 

This example already illustrates several interesting points. 
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Unlike a pass oriented system, we do rot have to apply all our 
predicates at orce, but only as (and where) needed. We male use 
of broad based, region oriented predicates to avoid the pro biers 
of high sensitivity until we have some hypotheses to guide the 
search for finer features. By working down from the general to 
the specific we avoid losing the forest for the weeds. And even 
at the lowest level we are still dealing with a broad based 
predicate, a line verifier. We also avoid an initial commitment 
to a world model of straight line planar objects. 

The next stage would provide experience in dealing with a 
complex of surfaces in non linear terms. A limited set of "real" 
objects, a set of tools for example, wculd provide a miniworld. 
The second stage system would be able to function in this world. 
Here we would have to incorporate a greater variety of data types 
and predicates or procedures. 

Another simple example is a hammer. We find an initial 
region. We suspect and verify a roughly rectangular shape. 
Relative length versus width prompts a handle hypothesis (or vice 
versa). This initiates searches for regions at either end f 
either crosswise or colinear to the handle axis (we could have a 
screwdriver). We find a chunky region crosswise at one end. We 
hypothesize a hammer with handle and head. We move back down to 
verify a few fine details to assure our success. 

Notice here some further principles emerging. Precisely 
what is required to see a hammer head varies depending on where 
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or from what direction you "enter" the observation. Think, fcr 
example, how carefully you would have to draw a hammer head to 
make it identifiable in the following contexts: at the end of a 
roughly drawn hammer handle, standing alone by itself, at the end 
of a carefully drawn hammer handle, in a picture of the head 
alore to be viewed only partially and in isolated pieces. The 
requirements are different if we move "down" from a knowledge, cr 
hypothesis, of "something at the end of a hammer", or if we move 
"up" from a detailed picture of the contour of a piece of the 
head. Generally we have a range of redundant informs tion that 
specifies an object; any particular set of details, e.g. of 
shape, are not always required for identification, and may be 
easier to verify than to obtain by "preprocessing". 

A flexibly programmed "understanding" of hammer heads should 
be able to be "entered" at several levels. Processing should 
proceed in parallel, with mutual calls, modified by the current 
state of knowledge. 

It may be appropriate here to initiate a Charniak-type study 
of "everything we need to know" about, e.g. hammers, in order to 
perceive them in varied contexts. Why should such extensive 
knowledge not be necessary to "understand" in visual contexts, 
just as it is required in verbal contexts? 

Another miniworld would be established to provide experience 
in dealing with classes of objects. A set of saws, for example, 
or a set of doll house furniture, could form such a class. The 
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third stage effort would provide a system that could move easily 
through the perceptual space of this class. 

Some of Wirograd's ideas on organizing knowledge might "be 
useful here. A systemic grammar might be a useful metaphor for 
at least one aspect of the organization. Also some convenient 
method of inputting knowledge would "be useful at this point. 
Ideally the system would he such that someone could eventually 
hook up the output of a Winston learning program to the input of 
this system. (And the output of this system to the input of a 
V/inston learning program, of course. ) 

Finally, a rich miniworld would be chosen to combine our 
previous experience in a more varied environment . A cutaway doll 
house, or a multipurpose workbench, for example. By the 
completion of this last stage our general principles for an 
integrated perceptual system should have "teen demonstrated, and 
new principles for implementing this integration should have 
emerged. 

This is rather an ambitious program. We could only hope to 
lay the groundwork really, to build an instructive testimonial to 
the possibilities derivable from a proper conjunction of hifb and 
low level knowledge. 
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Summary 



In summary, in order to make a quantum jump in vision 
research, a number of "bold steps are required. The thinking of 
pre-eminent theoreticians in the field has long tried to push us 
in these directions. However, "practical" considerations have 
too long held us hack. 

Instead of studying the parts and then deriving the "glue", 
we must study the glue and then derive the parts. 

It has for some time now been recognized that the real 
problems in vision lie in understanding the cooperation of the 
various subprocesses. In practice, it has been easier to define 
^ and deal with specific pieces of the spectrum of visual 
knowledge. However, we eventually end up sitting around 
bemoaning the difficulty of putting the pieces together. 

The solution is not simply to learn more about more pieces. 
Neither is it to treat the pieces we have as "black boxes" to be 
tied together. Aside from the theoretical arguments that could 
be mounted against this approach, it has proven rather barren so 
far in practice- To learn something about the mysterious 
"binding energies" as it were, we must simply grit our teeth and 
attack the problem directly. This approach will often mean 
messy, tentative, ad hoc progress. We must be willing to pay 
that price. We may utilize our hard won piecemeal knowledge, of 
course. However, when we have finally won our way to some 
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understanding of the interactive process cf vision, we may find 

ourselves in a tetter position to identify the essential 

subprocesres involved. 

Instead of generating data types and predicates to handle an 

idealized data base we must generate idealized data types and 
predicates to handle a real data base. 

Becoming absolute experts on line drawings will only qualify 
us as experts in graph theory. The humor of this approach is 
evident when we consider that "real" physical data is 
manufactured at great cost and effort to match this idealized 
data base, and even then the predicates derived for the lire 
drawing model cannot succeed in producing a satisfactory 
translation of the physical objects into line drawing data types. 
Our line drawing studies have provided us with some useful 
methodologies and results that should provide guidance and 
submcdules for a more general study. However it is time to 
return for inspiration and guidance to more realistic data: not 
to set up another panacea predicate, but to extract whatever 
information is required to organize the sensory input; not to 
organize simply and dogmatically as line or cormers, but as 
recognizable perceptions, as is appropriate for the data. 

Most fundamentally, the easy but artificial distinction 
between seeing, describing, and recognizing must be broken down. 
V/e cannot merely organize the visual input according to some 
"neutral " data base. V/e must not partition out part of our 
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knowledge to function as a template to "match" our results. The 
only way we can hope to organize the visual data to "match" a 
high level form is to use that high level knowledge to perform 
the organize tioru The concept of a "model" as such is outdated. 
The medium is the message. Recognition is the process cf 
description. Description is the process of recognition. 
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A HETERARCHICAL PROGRAM FOR RECOGKITICN OF FOLYHETRA 



"by Yoshiaki SKIRAI 



ABSTRACT 

Recognition of polyhedra by a heterarchical program is 
presented* The program is based on the strategy of recognizing 
objects step by step, at each time making use of the previous 
results* At each stage, the most obvious and simple assumption 
is nade and the assumption is tested* To find a line segment, a 
range of search is proposed. Once a line segment is found, more 
of the line is determined by tracking along it. Whenever a new 
fact is found, the program tries to reinterpret the scene taking 
the obtained information into consideration. Results of the 
experiment using an image dissector are satisfactory for scenes 
containing a few blocks and wedges. Some limitations of the 
present program and proposals for future developments are 
described. 
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1. INTRODUCTION 

We do not know how to make a program to recognize o ejects 
visually as well as a human being. One of the shortcomings of 
many computer programs is, as Minsky has pointed out, their 
hierarchical structure. A human may recognize objects in the 
context of the environment. The environment may be recognized 
based on his a priori knowledge. The recognition procedure is, 
however, well programmed so that the simple obvious parts are 
recognized first and the recognition proceeds to the more 
complicated details based on the previous results. 

The work in this paper studies an example of a heterarchical 
program to recognize, polyhedra with an image dissector. Most 
previous works begin by trying to find feature points in a entire 
scene and make a complete line drawing. It is very difficult to 
get a complete line drawing without knowledge about a scene. If 
the line drawing has some errors, the recognition by a theory 
based on the assumption of the complete line drawing such as 
Guzman's might make still more serious mistakes. Our work is 
an attempt to recognize objects step by step, at each time making 
use of the previous results. 

We assume in this paper that the difference in brightness 
between objects and the background is large enough to detect the 
boundary approximately. At present, this program works for 
recognizing moderately complicated configurations of blocks and 
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wedges. The limitations and proposals for future development 
are described later. 
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2. GENERAL STRATEGY 

2.1 Priority Of Processing 

Eor convenience, we set up the edges of the objects in a 
scene as fallirg into 3 classes. A line formed at the boundary 
between the bodies and the outer background is a contour lire of 
the bodies. In Fig.1, lines AB, BC, CD, DE, EF, FG, GH, HI, IJ, 
JK, KL, LM, MN, NO, AO, VW, V/X, XY, YZ and ZV are contour lines. 
A "boundary line is a line on the border of an object. Contour 
lines are boundary lines. In Fig.1, the boundary lines are the 
contour lines and lines on the boundary between two bodies, i.e. 
CP, PH, IQ, QR, and RM. An internal line occurs at the 
intersection of two planes of the same body. Lines JS, LS, QS, 
PT, NT, AT, FU, GU, DU and XV are internal lines. 

The global strategy is shown in Fig. 2. At first, the 
contour lines are extracted (because we assume a priori enough 
contrast between the objects and the background). If more than 
one contour is found, as in Fig.1, one contour for bodies B1, B2, 
B3 and another for body B4, then the boundary lines and internal 
lines are searched one by one for each contour. The global 
strategy in block 2 in Fig. 2 is as follows. 

A) Find boundary lines before finding internal lines because 
boundary lines often give good cues to guess internal lines. 
Note that to find boundary lines implies to find bodies. 

B) In searching for lines, different situations require 



<rs 



Page 205 



/•"N 




Fig. 1. Example of Scene 
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examination of larger or smaller areas. In our strategy, the 
smaller the area required to search lines, the higher priority 
we give to that search. In Fig.1, for instance, to determine 
the existence of a extension of line BC, it is enough to search 
a small area whose center is on the extension of the line. To 
find line IC, however, we should consider all possible 
directions of a line between IP and IJ. Thus the former search 
has priority over the latter. 

The priority to extract the most obvious information first 
is in the following order. 

1. If two boundary lines make a concave point (such as 
point B in Fig.?), try to find an extension of them. If only 
one extension is found, track along this line. Most of such 
cases are like in Fig. 3 (b) where one body hides the other. 
V/e can determine to which side of body this line belongs. 

2* If no extensions of two concave lines are found, try 
to find another line which starts from the concave point. If 
only one lire is found, track along this line. Most of these 
cases are as in Fig*3 (c) where it is not clear locally to 
which body this line belongs. In Fig. 3 (c), line BD belongs 
to the upper body* but this is not always true. That is the 
lines AB, BC, BD are not sufficient to decide the relation. 

3» If both extensions of two lines are found at a 
concave point, try to find a third one. If only one line is 
found, track along this line. This is the case as shown in 
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Fig. 3. Examples of concave boundary lines 
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Fig. 3 (d) where the third line is the boundary line. 

Whenever tracking terminates, an attempt is always made to 
connect the new line to the other lines that were already found. 
If more than one line segment is found in (1), (2) or (3), the 
tracking of those lines is put off hopefully to be clarified "by 
the results of knowledge obtained in simpler cases. Fig.4 
illustrates two extensions found at concave point P. The 
interpretation of the two lines is put off to treat simpler cases 
first. That is, one would continue examining the contour and 
lines AE and CD might be found next; then, by a circular search 
at points E (which is explained later), line EP would be found. 
At this stage it is easier to interpret lines AB and EP as 
boundary lines which separate two bodies. Then line DP might he 
found similarly and interpreted correctly. 

4. If an end of a boundary line is left unconnected as 
2Q in Fig. 5, try to find the line starting from the end point 
(Q in this example) by circular search. If multiple lines 
are found, try to decide which line is the boundary. If a 
boundary line is determined, track along it. In Fig. 5, the 
dotted lines are found by circular search and the arrows show 
the boundary lines to be tracked. 

5* If no line is found in the case (4) as stated above, 
extend the line (PQ in this example) by a certain length and 
test if the line is connected to other lines. If not, then 
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apply circular search again as in (4). This is necessary 

because the termination point of the tracking is not always 

precise. 

Note that this process can be repeated until successful 

(that is either the line is connected to other lines or line 

segments are found by circular search). 

6. If the boundary lines of a body are known, select the 
vertices of the boundary that might have internal lines 
starting from them. The selection of vertices is based on 
heuristics such as selecting upper right vertex rather than 
lower right vertex. At each vertex, try to find an internal 
line which is nearly parallel to other boundary lines. If 
one line is found, track along it. In Fig.1, for example, 
internal line JS is parallel to the boundary line KL or IQ, 
and QS is parallel to Rl or IJ. Line PU is parallel to FD 
and XV is parallel to XZ. Thus it is often useful to find 
internal lines parallel to boundary lines of the same body. 
Note that search for parallels has small area. 

7* If no line is found in (6), try to find one by 
circular search between adjacent boundary lines. When one 
line is found, track along it. In Fig. 6, circular search 
between BA and BC is necessary to find the internal line BE. 

8« If two internal lines meet at a vertex, try to find 
another internal line starting at the vertex. This process 
is used in two cases. One is where no internal line was 
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found in (7) because of little difference in "brightness 
between adjacent faces. Suppose in Fig.1, that the internal 
line SJ was not found at vertex J, but that LS and QS were 
found. Then try to find an internal line starting at S 
toward J, If there is enough contrast near S, a line segment 
is found. The other case is where a body is partly hidden by 
other bodies. In Fig. 6, the triangular prism is partly 
hidden. After BE and CE are found, EF is searched for. In 
both cases, the direction of the line is sometimes 
predictable and sometimes not. If it is predictable, then 
try that direction. If it is unpredictable or if the 
predicted direction failed, then apply circular search 
between the two internal lines. If one line is found, track 
along it. 

9. If an end of an internal line is not connected to any 
line, try to find lines starting from the end by circular 
search. If lines are found, track along them one by one. 

10. If no lone is found in (9), extend the line by a 
certain length as in (5) and test if it is connected to other 
lines. If not connected, try circular search again as in 
(9)« This process can also be repeated until successful. 
Fig. 7 illustrates this process. In Fig.7 (a), line MN' is 
not connected to others at N% thus step (9) is tried at N* 
and fails. The line is extended to P and (9) is again 

/■^ applied. This process is repeated until the line is 
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connected to line KI at N. Fig.7 (b) shows that line HI is 
extended by this process tc P where a new line is found "by 
circular search. Similarly line CG is extended to F . This 
process is useful so as to not miss a new "body sitting on an 
obscure edge. 

At each stage when an above step is finished, the obtained 
information is interpreted as shown in Fig.2 (block 3). For 
instance, if tracking along a line terminates, a test is made 
whether the line is an extension of other lines and/or the line 
is connected to other lines at a vertex. If a boundary line is 
connected to another boundary line, the body having the lines is 
split into two bodies and the properties of both lines and 
vertices are stored in an appropriate structure. In Fig.8, for 
example, line N'P' is obtained by tracking starting at point N. 
This line is interpreted as an extension of HN, and W and N'P' 
are merged into one .straight line using the equations of these 
two lines. Then, it is connected to CO and Fig. 8 (b) is 
obtained. Before the line was connected to CO, there were two 
bodies E1 and B2 as in Fig.8 (a). Now body B2 is split into two 
bodies B2 and B3. We can interpret line NO as the boundary of B3 
which hides a part of B2. The other properties of lines and 
vertices are obtained similarly at this stage. 
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2.2 Example 

We illustrate the entire line-finding procedure with the aid 
of the example shown in Pig. 9. At first, the contour lines AE, 
BC, CD, IE, FF, FG, CH, HI, IJ, JK and KA are obtained as shown 
in Fig. 9 (a). Step (1) described in the previous section is 
tried for the concave points G and J. In this example, the 
position of G is not precise enough to find the extension of FG. 
On the other hand, a line segment is found as an extension of the 
line KJ. KJ is extended by tracking as far as L* Because there 
is no other point to which step (1) is applicable, step (2) is 
tried for point G. One line segment is found and extended till 
tracking terminates. Thus a line G'M* is obtained as in Fig. 9 
(b). This line is interpreted as an extension of FG and 
connected to JL. Then the position of point F, G, L are adjusted 
to as shown in Fig. 9 (c). Now two bodies B1 and B2 are created 
by the boundary lines GL and JL. It is important to notice this, 
for it means that step (1) is again applicable (to point L) at 
this stage. Thus line FL is extended as far as M in Fig. 9 (d) 
(Note that line MN' has not yet been found). LM is interpreted 
as an extension of FL but the end point M is not connected to any 
other lines. Thus vertices F, G, L and end point M are adjusted 
considering the new line LM. Here neither step (1), (2) nor (3) 
is applicable, so that (4) is now applied to M. Three lines are 
found by circular search as Fig. 9 (d). MN' is determined as a 
boundary line and extended by tracking. When it terminates, the 
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line is connected to boundary line BC at vertex N a? in Fig. 9 
(e). Body B1 splits into "body B1 And B3. It is known at this 
stage that E1 is hidden by B3 and B2 is hidden partly by P3 and 
partly by B1. Next, step (6) is applied to each body one by ore 
at each time selecting the easiest body for proposing the 
internal lines (in this example, the order is B3, B1, B2 because 
B3 bides B1 which hides B2). Internal lines CO and MO are found 
and connected at vertex 0, but no line segment is found using 
step (6) and (7) applied to vertex E (this stage is shown in 
Fig.9 (e)). Step (8) is applied to vertex and a line segment 
toward E is found. This is extended by tracking as far as E* as 
in Fig.9 (f). Line OE' fails to be connected to any other lines 
which activates step (10). After a few trials, OF' is extended 
to connect to vertex E. Similarly, internal line AM is obtained 
for body B1 and line IP is obtained for B2. When every step has 
finished, three bodies are known together with the relationships 
between them. 
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3- ALGORITHMS 

This section describes the details of the algorithms that 
are used in finding contours and in the steps stated in section 
2. Some of them, such as tracking and circular search are used 
in more than one step* An algorithm used in more than one step 
may be slightly different in each step but its essential part is 
not changed* In the trackirg algorithm, for example, some 
changes occur depending on whether tracking is used for boundary 
lines or internal lines. 

3*1 Contour Finding 

Fig, 10 shows the outline of the procedure to fird contour 
lines. The picture data obtained with an image dissector 
usually consists of a large number of points (say about 100,000) 
each of which represents light intensity level* To speed up the 
processing, one point for every 8x8 points is sampled. This 
compressed picture data consists of 1/64 the number of points in 
the original picture* To find the contour, this data is scanned 
till a contour point is found* The judgement of contour point is 
based upon the simple assumption that there is enough contrast 
between the background and objects. It is then checked whether 
or not the point is a noise point. If it is a real contour 
point, trace along the contour. Thus a set of contour points are 
found. Then, the picture data is again scanned until a new 
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contour point is found. This process is repeated for all the 
picture data. When all the sets of contour points have been 
found, each set is separately analysed. 

Suppose a certain set of contour points is to be analysed. 
We now return to the original high-resolution picture. We can 
guess approximately the position of the boundary point in the 
original picture data which corresponds to the first point found 
in the sampled picture. The precise boundary point is searched 
for near this point. A set of contour points is obtained by 
tracing from this point in the same way as in the sample picture. 
A polygon is formed after we connect contour points one by one. 
To classify the points of this 'curve' into segments, the 
'curvature' of the polygon is used. This curvature in a digital 
picture is defined here for convenience as shown in Fig. 11. Each 
cell in the figure represents a contour point* The curvature of 
a point P is defined to be the difference in angle between PR and 
PQ (ck), where Q and R are a constant number of points away from P 
(6 points in this case). If we plot the curvature along the 
contour as shown in Fig. 12, we can tell what part is near a 
vertex and what part lies in a straight line. 

Note that curvature is not very sensitive to noise or 
digitization error* If we integrate the curvature in part of a 
straight line, the result is nearly zero despite the effect of 
noise. If we sum up the curvature of consecutive points whose 
absolute value is greater than some threshold, we can determine 
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Fig. 11. Illustrates the definition of curvature 
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the existence of a vertex. That is if this sum of the curvature 
of such poir.ts exceeds a certain threshold, there is a vertex 
near those points. Thus every contour point is classified to he 
either in the straight part of a line or near a vertex. Using 
points which "belong to the straight part of a line, the equation 
of the line is calculated. Then each vertex is decided as an 
intersection of two adjacent lines. 

3.2 Line segment Detection 

A line Segment is detected given its direction and starting 
point. This procedure is used in most of the steps stated in 
section 1. The procedure consists of two parts. One is to 
detect the possible feature points which are to he regarded as 
elements of the line. The other is to test whether or not 
obtained feature points make a line segment. 

In detecting feature points r we should consider various 
types of edges. Herskovits and Binford classified the light 
intensity profiles across an edge into 3 types, namely step, roof 
and edge-effect, and proposed 3 types of boundary detectors. In 
this paper, a roof type detector is not considered because roof 
type edges can be detected by a step detector or an edge-effect 
detector. In addition, most roof type edges are accompanied by 
step or edge-effect types. We set up local Cartesian coordinates 
U-V such that U is the direction of the line segment to be 
detected. let I(u,v) denote the light intensity at point (u,v), 



Page 221 



j^\ 



and define the contrast function F (v) at (u,v) as 



P (v) =Z? 2Z (l(u + i,v + o)-I(u + i,v - o)| 

Suppose we have an intensity profile as shown in Pig. 13 (a), 
F u (v) at P in Fig* 13 (b) is the difference of summed intensity 
between area A z and A, • ^(v) for a typical step type profile 
(Fig* 13 (a)) is shown in Fig* 13 (c) in which the edge is detected 
as the peak* The typical profiles of F^v) for other types are 
shown in Fig. 14 where the edge is detected as the middle point 
between positive and negative peaks* 

The basic procedure, there fore, is to detect the peak of 
^p-s F u (v) and its position* The necessary properties for a peak are 
as follows (see Fig* 15)* 

(a) If F u (v) ranges from v t to v r , there must exist the 
maximum of F^(v) at Vm, other than v^ or v f . 

(b) Fxx(v m ) > fro where fm is threshold* 

(c) There must exist a minimum of F^(v) at v t between v< and v m 
and a minimum of F u (v) at v^ between v m and v r such that 
*V(v m ) - F*<v, ) > f d 

F u( v m) - F u(v r ) > fd where fd is a threshold 
If such v^ is found, the left of the peak (v 3 ) and the right 
of the peak (v^) are determined as the intersection of F u (v) with 
the line F u (v) = f^ as shown in Fig. 15* The value of f^ depends 
on F^(v m ) and is represented as 

^ f t a C^JvJ + C C 
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where c, and c are constant and < c, < 1, c > 
The position of the peak v is obtained as the middle of v 3 and 
v H . If more than one peak is found between v t and v r , the 
point v which is nearest to the middle of v £ and \^ is adopted. 
A negative peak is similarly detected. A feature point for an 
edge-effect or roof is obtained as the middle of the positive and 
negative peaks, if both are found, (although the threshold f_ is 
not the same as in the simple positive or negative peak 
detection). This method for the detection cf peaks and 
positions is not appreciably affected by noise. 

The other part of the line segment detection is a test of 
the co-linearity for the detected feature points. Suppose 
concave boundary lines L and L t meet at P and suppose the line 
segment extending L is tested as shown in Fig. 16. Feature 
points are detected in a rectangular search area with given 
length and width whose direction is equal to that of L (= U), at 
an appropriate place where the detection of feature points is not 
affected by the edge corresponding to L t . Feature points are 
detected along the direction v at the center points P, ,\ ,...,P m 
sequentially. If positive peaks are found at M, ,M 1 ,. ..,M n as 
shown in the figure, the linearity of the points are tested as 
follows. 

(a) The number of the feature points must exceed a threshold 
number n . 

(b) The deviation 6* of the points in line fitting with the 
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least square method should be less than a threshold Ot . 
(c) let U' denote the direction of line segment obtained "by 
line fitting, 

|u' - u| < u d 

where |U' - U| denotes the diffrence in directions U' and U 

Similar tests are made for the different types of feature 
points. If more than one type of line segment is found, the 
selection depends on the following criteria, 

(a) If an edge-effect type is found, then it is selected. 

(b) For the line segment with O and U*, let the criterion 
function C he 

C -& + WiJU' - U| where w is a constant 
The line segment selected is the one with smaller C. 

3.3 Circular Search 

Circular search is used to search for lines starting at a 
given point. The direction of the lines to he searched for is 
not known. The range of directions in circular search depends 
upon the particular case. Suppose two known lines L, and L2 meet 
at P as in Fig. 17 (a) and suppose we wish to search for lines 
lying between them. The search range ck is between two lines L[ 
and L^ whose directions are slightly inside of l x and L2. 
respectively. If lines starting at point P of line L are 
searched for as in Fig. 17 (b), I( and L£ are similarly set inside 
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of L . The center point P of the circular search is not always 
precisely determined, especially when tracking along a line has 
terminated at point P as shown in Fig. 17 (b). Therefore circular 
search should not he too sensitive to the position of the center 
point. 

It might he natural to try to detect feature points, as 
defined in section 3.2, based upon F H (v) along arcs around the 
center. The difficulty with this search is the classification 
of feature points into line segments if there is more than one as 
shown in Pig. 18. To avoid this difficulty, a simple algorithm is 
used in this paper. Its "basic method is to apply line segment 
detection successively in various directions. This is 
illustrated in Fig. 19, where successive line segment detections 
toward u, , Ul and u 3 are applied. The step of direction change 
and search area (A, ,A Z and A 3 in the figure) are determined so 
that line segments of any direction near the center point can be 
found. Thus successive circular search along a line as shown in 
Fig. 7 can find lines . starting at points between two adjacent 
center points (e.g, line L in Fig.20 starting between P, and P^). 
The algorithm for line segment detection is the same as described 
in 3.2 except with respect to thresholds and search area. 
Because the search areas for different directions overlap each 
other, the same line segment may be found in different searches. 
Each time a line segment is found by line segment detection, a 
check should be made whether or not it is the same as the one 
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X represents a feature point 



Fig. 18. Illustrates difficulty in classification 
of feature points. 
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obtained by the previous detection. 

If the center point of circular search has not been 
determined precisely, it is not always possible to find all the 
lines starting at the given point. In Fig.21, for example, line 
Lx might be missed in circular search at P . To avoid this 
inconvenience, when line segments are found (such as L, and L3 in 
the figure), a new center point Pj is calculated based on the 
known line (L ) and the obtained line segments (L, and Lj). 
Then circular search is applied again at P t . 

3.4 Tracking 

Tracking is used when a line segment is given, to track 
along it until it terminates. The requirements for a tracking 
procedure are 1) the line should not be lost due to the effect of 
other lines or noise, and 2) the procedure should terminate as 
precisely as possible at the end of the line. These requirements 
are contradictory in that the termination condition should be 
strict to satisfy the second requirement which makes it difficult 
to satisfy the first. . The following algorithm is a compromise 
between these requirements. 

The basic procedure is to predict the location of a feature 
point and to search for it near the point using line segment 
detection. The result of the search is classified into the 
following 4 cases. 

(a) there is no feature point. 
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(b) a feature point is on the line. 

(c) A feature point is not on the line. 

(d) It is not clear whether or not a feature point is on the line. 

In case (a), the detection of a feature point is similar to 
line segment detection except that the type of edge is already 
known so that the thresholds stated in 3.2 can he adjusted hased 
on the average peak of F u (v). The decision between cases (h), 
(c) and (d) is made using the distance d between the point and 
the line. That is 

If d < d, then case (h) 

If d > d£ then case (c) 

If d, < d <^ d^ then case (d) 

The threshold d, changes depending on the state of tracking. 
The state of tracking is represented by two integers m, and i z 
which are set initially to 0. The value of m and m are changed 
for each case (a), (b), (c) and (d) as follows. 

(a) m t s m, + 1 

(b) If m, > m 2 , B| = i, - 1 , (where i, is a constant) 
Otherwise, m, = 0, m t = 0, and classify those feature points 

into (b) which have been clasified into case (d) 
in the previous steps of tracking. Adjust the 
equation of the line with these points and the 
present feature point 

(c) If d <, dj and Hj > m^, m, = m, - 1 
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(where d is a constant) 
Otherwise no change 
(d) m r = m^ + 1 , and if m,> m^, m,= m,- 1 

The threshold d, is represented as 
d t = d + w m m z (where d and w m are constants) 
This procedure is repeated and tracking proceeds step by step 
extending the line until the termination condition is satisfied. 
The termination condition of tracking is either 

nij > m n or m, + m^ > m^ (where m n and m^ are constants) 
The terminal point is defined as the last point classified into 
case (b). Fig. 22 illustrates how this algorithm works. In 
Pig. 22 (a), two lines cross at P . Tracking might finish at some 
point beyond T (?m in the figure) which satisfies the 
termination condition. The terminal point of tracking is, 
however, determined more precisely near P©(Pj or ? z ). In Pig. 22 
(b), P, ,P z ,p3,P 4 are classified into case (d) increasing the value 
of ii which classifies Py into case (b). Then the line is 
adjusted with these points which are now classified into case (b) 
and tracking proceeds. 

Pig* 22 (c) and (d) illustrate that even if a part of the 
intensity profile is disturbed by noise or other lines, tracking 
does not terminate there* In Pig. 22 (d), however, if the light 
intensity of the right side of L changes across L v , the type of 
feature points might change across Lj . Thus feature points PjjBj 
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, ... might not be obtained and tracking might terminate at P, . 
When tracking terminates, the line segment detection is applied 
at the extension of the line to see if another type of line 
segment is found. If found, we adjust the line equation and 
tracking proceeds. If not found, tracking finally terminates at 
point P, and the position of P, is adjusted with the line 
equation. The above procedure often extends the line across 
other lines when it terminates temporarily at their crossing as 
in Fig. 9 (b) where tracking along G'M' crosses many vertical 
lines. 
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4- EXPERIMENTAL RESULTS AMD COMMENTS 

To test the program, experiments are made with cubes and 
wedges having relatively uniform white surfaces placed en a black 
background- The image dissector camera, used as an input device, 
dissects the scene onto a 200CC x 20000 (octal) grid. In this 
experiment, one point for every 8x8 block of grid elements is 
sampled * Thus, the scene is represented by 1024 x 1024 grid 

points • Objects occupy only a part of the scene. In the 
typical scene, the rectangular area which includes the objects of 
interest may consist of about 400 x 400 points* This area is 
divided into blocks each of which is made of 64 x 64 points and 
stored in disk memory. When a light intensity at some point is 
required, a block containing the point and adjacent blocks are 
stored in core memory ♦ The core memory is accessed for the input 
of the light intensity until a point outside of those blocks is 
referenced* 

Video input is at first converted into a 10 bit digital 
number which is an inverse linear measure of the light intensity. 
It is again converted into 10 bit logarithmic measure. Some 
intensity level resolution is lost in the logarithmic conversion. 
In this experiment the light intensity is represented by a little 
less than 100 levels.. The input data for a clear bright edge in 
the dark background is blurred due to some limitations (mostly 
def ocusing) . If the intensity change is a step function, there 
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is a transient area in the input data about 10 points wide. Thus 
the resolution of the picture is regarded as 10 points. The 
parameters used in line segment detection and tracking are based 
upon this resolution. Features of the picture involving 
resolution of less than 10 points are not usually found. 

Some results are shown in Fig. 23. The difficulty or 
processing time of the recognition depends not only on the 
complexity of the object but also on the information extracted at 
each stage. In Fig.23 (c), for example, boundary lines SJ, KS 
and QS are easily proposed as the extension of contour lines. On 
the other hand, it is not easy to find boundary lines KM or LM in 
^-v Fig.23 (c ). That is, after DK and HL are found, circular search 
is necessary at K and L respectively. Circular search is less 
reliable in finding a line segment, and more time consuming. 
Once the boundary lines are determined, all the internal lines 
are proposed in both cases. But tracking along VW in Fig.23 (c ) 
and EN in Fig.23 (c ) terminates in the middle. Then step (10) 
stated in section 1.1 is applied. This is the most time 
consuming process (about 10 times more than the simple tracking 
process). 

Some examples of the result of a hierarchical program are 
shown in Fig„24. Hierarchical programs may look at the whole 
scene homogeneously and pick up feature points. Lines are found 
with those feature points obtained in the previous stage. It is 
very difficult to determine a priori the various thresholds for 
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Fig. 24. Examples of Comparison Between Hierarchical 
and Heterarchical Program. 
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detection of feature points, line fitting and connection of 
lines. In this heterarchical program, it is possible to adjust 
various thresholds with the context of the information obtained 
previously. Furthermore the algorithm itself can be modified 
case by case. (For instance, tracking algorithm is changed 
depending on whether the line is a boundary or internal.) The 
results of experiments with moderately complex scenes are mostly 
satisfactory. Because of the many checks for consistency of 
lines and vertices, the program has small probability of finding 
false lines. 

However, there are some limitations of this program at 
^ present. One of them is that bodies may be missed in some 
cases. A simple example is shown in Fig. 25. The boundary lines 
AB and EC in Fig. 25 (a) are not proposed though the other contour 
lines and internal lines are found, because the resulting regions 
are so 'neat' that no conceive vertices activate step (1). In 
such a case when bodies are neatly stacked, it is necessary to 
search for boundary lines which start from some points on the 
boundary line. In Fig.25 (b) body B2 is not found. To find a 
body that is included in a face of another body, it is necessary 
to search for line segments inside the region. Though these two 
kind of search (search along the boundary line and search in the 
region) are required to find all the bodies in the scenes as 
shown in Fig. 25, they are still more effective than the 
^ exhaustive search in the entire scene. Besides, it is simpler to 

Preceding page blank 
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interpret the scene when a line is found by those searches. This 
procedure, however, is left to future work. 

The other limitation of the present program is, as stated in 
the introduction, that it is not always applicable to concave 
objects* Fig, 26 (a) shows a simple example. Line BB is found 
as an extension of line CB. If all the bodies are convex, line 
BD is interpreted as the boundary line as shown in Fig. 26 (b). 
This does not hold for concave bodies. In this program, line BD 
is regarded as a boundary line, and then line DE can be found by 
circular search at D. At this stage, however DE should be 
interpreted as an internal line of the same body instead of the 
boundary line which seperates .the body into two. If DE is 
interpreted correctly, then line BD can be determined as an 
internal line. This procedure should also be implemented in the 
present program. 
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CONCLUSION 

A heterarchical program to recognize polyhedra is presented. 

The program is based upon the strategy of recognizing objects 

step by step, at each time making use of the previous results. 

The order of the lines to be detected is 1) contour lines 

(boundary of bodies and the background), 2) boundary lines which 

are the boundary between two bodies, 3) internal lines 

(intersection of two faces of the same body. Among boundary 

lines or among internal lines, the 'most plausible lines' are 

^-n proposed at each stage and an attempt is made to find the line. 

To find a line, the range where a line segment may exist is 

proposed and it is detected in a suitable way for the proposed 

range. If a proper line segment is found, the end of the line 

is determined by tracking along the line. V/hen the line is 

determined, the program tries to understand the scene taking this 

line into consideration. Because lines are mostly proposed 

instead of found by exhaustive search in the scene, the program 

is relatively effective. Results of the experiment using an 

image dissector are satisfactory for scenes including a few 

blocks and wedges. Although the present program has limitations, 

some of them may be overcome by developments proposed here for 

future work. 
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LINES, LINE-SEGMENTS, REPRESENTATIONS AND CALCULATIONS: 



There are many ways of representing a line by an appropriate 
equation, A minimum of two parameters is required. Unfortunately 
these parameters tend to become singular near certain angles. 
Special purpose tests must be made to handle these cases. 
The more redundant 3 and 4 parameter representations not only 
solve this problem but simplify the operations required to 
manipulate lines and points. 

Let 6- be the inclination of the line to the x-axis and j* its 
perpendicular distance from the origin. 

Let Cx0,y^) , (.X|, y t ) and (x-^,y^) be points on the line. Then 
some equations for a line are: 

^^ y - (tan £)x — — — — « 
f cos 8 

ytsmx^c — <£ 



x - x i y-y, 



V^ x - (cot 0)y J^ ^ n 

sin 



(x-x, ) (y^-y f ) - (y-y, )(x v -x ( ) s 



x ^~ x . y x -y, 



Let t^r^x^-Xj)^ +-(y -y^"^ cos 8- ■ sin Br- 



(x-x # ) sin © - Cy-y*) cos 8*0 

x sin - y cos 6 "*^ s A n -(x^sin 8 - y o cos 0) 

This last formulation is nice from a number of points of view. It 
is always non-singular, easy to use and allows the least-squares 
equations to be formulated neatly. Note that we have a choice of 
polarity, by multiplying the whole equation by -1. We can choose 
a preferred direction along or across the line in this fashion. 
This allows us to represent directed lines. 

Now we get on to using this representation. First we note that the 
equation of a line at right angles through a point (x* ,y. ) is: 

(x-x.) cos & «+ (y-y*) sin 0* st 
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Our line and one point (x,,y ) on it implicitly define a coordinate 
transformation: 



! 




















x 1 


-* 


cos 8 


sin 




x-x 




y' 


\ 


sin 


-cos 9 




y-y p 




erse is: 




x-x p 




cos 


-sin 0- 




x f 


y-y* 




sin 0- 


cos 




y f 


% 





















The separation (perpendicular to the line) of two joints: 



/""N 



(x^-x^ sin - (y x -y,) cos ( ft\ 

In particular the distance of a point from the line is : 
x sin -y cos + f> 

Not surprisingly the equation shows the line, to consist of points 
with zero distance from the line. 

The separation (along the line) of two points: 

(x^-x,) cos Ar (y v -y t ) sin 0- ( ^\ 
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In particular suppose that the end-points of a line segment are (x^ ,y p ) 
and (x^y^). Then a point lies in the band generated by projecting 
this line segment perpendicular to the line if: 

Qx-x,) cos » •* (y-y t ) sin 6 J LU-x^) cos € -f (y-y z ) sin 9J<0 

This can be used to determine if a point on the line lies within 
the line segment from (x t ,y,) to (x t ,y t ). 




The sine of the angle between two lines: 

A « sin(e^-6 x ) s& sin 0, cos \ - cos 0, sin 9 X 
We use this in the equation for the intersection of two lines: 

x * (cos 9 ft - cos % % /> u )/ A 

y e (sin 0^* - sin 6 # ^t. )/ /^ 

Naturally if A* 0, the lines are parallel and we lose. 
To find out if two line-segments intersect, we use these equations 
to find the intersection of the corresponding lines, then apply 
the above "band" test twice to see if this point is inside both 
line-segments . 

Nextjto project a point perpendicularly onto the line we perform 
two opposite rotations about the origin: 






x , * x cos 6*4 y sin %- 

x v r x , cos © -y, sin 9- 
y^ « x sin $ 4 y , cos ©■ 



( *o,Y^ 




^*»s 
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Next we get to least-squares fitting a line to a set of points: 




f\ 



We minimise the sum of squares of perpendicular distances of the 
points from the line (moment of inertia) : 

e ^ Z(x i sin 6 - y cos -lyO ) 2 
-^ e 2 - 2 [ sin 6 Ex. -cosS 22 y. y Z l] 



For this to be we must have: 

P e-("x, sine-y, cos %) where 

x. *Tx ± /n y, s Jy. . /n 

That is, the line passes through the centre of gravity of the points. 
Changing coordinates to a system relative to this favoured point we get: 



x: » x.-x 

1 1 o 



y ± * y±-y 



e -22 L^-*,) sin & - (y ± -y ) cos &3 2 
- 2C(x! sin © - y! cos 6 ) 2 

O ■ rri X* a. 

= (sine) ZLx^ -2(sin 6)(cos &) 21x^y! ^ (cos «) 2 21y! 

* 1/2 [^i+Zyf)- (ixf-Iyf )cos 2% - zjx'y' sin 2el 
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Note that we can find some of these terms as follows: 



21 x! « 2Z x. - n 



x ^L y : •=- ^ y . - n y 

o ,■ ; i • J x c 



^Lx y! sr ]2— x.y. - n x y 
t * 



For compactness let: 

a: 2lTx!yV b*Zx! -^y! c * j a 2 4 b 2 

t . i i t - i c i 

We get: 



d x 

-rr*e «? b sin 20 - a cos 20 
art 



For this to be zero we must have: 

tan 20 s. a/b J •.£. sin 26 * a/c cos 20 * b/c 

2c 



J l*cos 20 % _ / b4 
2 ~ J 2c 



Now cos 8c 



And sin - ,___— ~ (a/2c)/ / ^ 
2 cos 8 V 2c 



If asO, so is sin 0. Again we can choose to multiply the whole thing by -1 
to decide the direction. 

An equivalent way of getting this last result is the following: 

-14 ^/(tan 26) 2 » 1 _ -b4/a 2 4 b 2 ^ c-b_ _ °l 
tan 9 " (tan 20) * a a * Ob 

cos 0*l/J(l+(tan 6)^* ) , a y ^ ■ — - % = * 1^^ 

Ja +(b-c) Z J2c(c-b) V 2c 



f*s 



^ 



Different least-squares fits can be described (for example one which 
minimises the sum of squares of distance parallel to the y-axis), but 
this one has the property that the fit does not depend on the coordinate 
system (invariant with rotation for example). 

Note that while trigonometric functions appear all over these results, 
none ever get evaluated. Trigonometric functions are a most useful 
intellectual crutch; there is, however, seldom a need to actually 
use them in numeric calculations. One can usually replace them 
using the well-known relationships amongst them and reduce the required 
operations to +,-,-*, / and a/ ' only. In the above formulas 
for least-square fitting for example, the only requirement is for two 
square root evaluations. 



We ought to also check that we in fact have a minimum: 
e 2 ^r n > 



d 2 2 



f 



d 2 2., 2 



e 2 s 2 b cos 2& + 2a sin 26 « 2 . a .. + b *2ja 2 -f b 2 * > 



So indeed we have a minimum. We might also want to know the average error, 
and the average error if, instead, we had chosen the worst line (one at 
right angles to the best line) . 

e ± = (sin 6) 2>i " 2 cos 9 sin Stjxlyl 4 (cos 8) 2 Zy f 
i • 1 x j ± 

e 2 r (cos 6) Zx! 4 2 sin cos elx!y! f (sin &) 2 Ix» 
i i x x i ± 

f* x i * T y \ then we can also wr ite the above as: 



Let d^^Xy!*" 



ej s 1/2 ( d-c ) 

e 2 2 r 1/2 ( d+c ) 
The ratio can be used as a "form-factor". 



Note that all of this line-fitting can be modified to handle weighted 
points by simple multiplying the coordinates (x.,y.) by the weights w 
and using Z. w. instead of n. 1 1 i* 
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Now suppose we are given several lines and are required to find 
a point with minimum sum of .squares of perpendicular distance. 



e 35=. ^r~ ( x sin 0. - y cos 0. + P .) 

t i y l / i 

2 ^* 2 *^~ 2 T" 

ss x -4-(sin 0-.) -2xy ^.sin %, cos 6. 4- y ^L(cos 0-.)' 
t i { l l i i 

2 xZsin 0\ />. -2 yZlcos 0-. f> . +Z/> 2 
i i/i y c i/i x i/ i 

d 2 T* 2 «- 

-TZ e ~ 2( xz~(sin 0.) - y zl sin #. cos 0. +"2Ia sin 0.) 
dA . i ^ j i i f / i i 



d 

"T" e * 2 (-x X sin . cos . +- y ZL( cos 0.) — i*/>. cos 0.) 

dy ( i i ( i ; ^i i 



Let A *X(sin 0.) 2 • Z, (cos &.)' 



( ZT sin 0-. cos 0.)' 
/ii 



This will only be zero if all the lines are parallel* Solving the 
above set of equations in x and y we get: 

x* h£(coS d ± ) 2 27/^sin e ± -+■ Xsin & cos « ?^ ± cos 0. J/ ^ 

y e ££(#" 6 ± cos ± ) S^sin ± 4. X(sin © 1 ) 2 ?^ ± cos ± 7/ ^ 

We also ought to check whether this gives us a minimum: 



r e 2 «r 2 J (sin 0.)' 



> 



dx 



a 2 2 



jL e «■ 2 C(cos &.) ;> 

,2 I x 



dy 



We can weight the lines by simply 
multiplying sin 9., cos 9. , fi . 
by the weights w.. 




/ m \ 



Page 250 



PROJECTION OF A RECTANGULAR CORNER: 




/""""N 



Given that the tri-hedral vertex is formed by three planes meeting 
at right angles, find the inclinations of the three lines <x,/^,Y 
relative to the image plane. Let these angles be ot,t,e . Note that 
the angle between the plane containing^ relative to the view vector 
is also a and so on for the other sides of the object. 

This information is useful in defining the elevation and rotation 
of the eye relative to the coordinate system implicitly defined by the 
rectangular object. The angles can also be used to correct the fore- 
shortening introduced by the inclination of the lines relative to the 
image plane, 

Now consider the following spherical triangle: 
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Using a spherical trigonometry formula we get: 



- cos 1 ^ = cosC^" -a) cos(- -b) + sin(£ -a) sin(- -b) cos C 
2 2 2 2 2 

= sin a sin b + cos a cos b cos C 

cos C = -t*n a tan b and by symmetry: 

cos B = -tan c tan a 
cos A = -tan b tan c 

cos B cos C 



tan a 



cos a 



cos A 



1 I cos A v 

J\ + tan 1, a v cos A ^ cos B cos C 



Where a<0 if and only if A > If . 

So we have the angle of the line o( relative to the image plane. As 
mentioned this also gives us the inclination of the plane containing 
Jl and # relative to the view vector. The others are found by symmetry. 

To get the "unforshortened" length of the lines, that is the length in 
the image if they had been oriented at right angles to the view vector, 
we just divide by the cosine of the inclination angle: 

& - &C / cos a 

We also note that the cosines needed in the formulae can be obtained 
simply by using dot-products: 

So we don't need to use trig-functions at all, only +,-,*,/ and /P -1 ■ 
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A few other random formulae in this relation: 

Since A + B + C = 2tt , cos A = cos B cos C - sin B sin C 

because of that. 



si 



cos A s 

_ . - 

sin B sin C 

/ tan*a * / 
J 1 + tan^a >/ c 



cos B cos C 



cos B cos C - cos A 



ycos B cos C 
sin B sin C 



f***\. 



Another derivation not involving spherical triangles is as follows: 



5+** 




Using the formulae for plane triangles: 

Csin a - sin b) 2 + x 2 - 2 

2 2 2 
x =; cos a + cos b - 2 cos a cos b cos C 

2 = (sin 2 a + cos 2 a) + (sin 2 b + cos 2 b) - 2 sin a sin b 
cos <? = - tan a tan b as before 



- 2 cos a cos b cos C 



/■> 
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RELATIONS BETWEEN SIDES AND ANGLES OF ANY PLANE TRIANGLE: 




sin A sin B sin C 
a b c 



a 2 = b 2 + c 2 - 2bc cos A 



a = b cos C + c cos B 



RELATIONS IN ANY SPHERICAL TRIANGLE: 



( = diameter of circumscribed circle) 




sin A sin B sin C 
sin a sin b sin c 

cos a = cos b cos c + sin b sin c cos A 

cos A = - cos B cos C + sin B sin C cos a 



Ca, b, c are the length of the arc on a unit sphere * alternatively 
one can think of them as the angle (in radians) subtended by the 
arc at the centre of the sphere) 
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PROXIMITY FINDING: 

In the later phases of line-finding programs it is often necessary 
to repeatedly locate lines that are close together, lines that pass 
close to a given vertex and so on. To do this efficiently we require 
a fast access method to locate likely candidates for more sensitive 
tests. First consider the problem of deciding if two points are close 
together. 

To tell if x and y are close together we can quantise both (by dividing 
by the quantisation interval size and truncating). If the two numbers 
[x/d] and [y/d] are the same we win, but it may be that x and y just 
straddle a boundary defined by our truncation algorithm. We need 
a second, interlaced set of boundaries and calculate [x/d+.5], [y/d+.5]. 
Now if either pair of numbers matches we know that |x-y)<d. Conversely if 
|x-yKd/2 we are guaranteed that at least one pair will match. 

This method only comes into its own if we have large sets of points. We 
then simply find the two integer-codes for each one and add it to the 
appropriate buckets. To find which points are near a given point we 
determine its two integer-codes and collect the union of the two 
corresponding buckets. 

* > 




This method can now be extended to n dimensions. We need at least n+1 sets 
of buckets if we use n-tetrahedrons. It may be more convenient to use 2 n 
sets of buckets if the unit cell is an n-cube.(d/n versus d/2 min sep) 

Line-segments and curves can be handled by entering each point on them 
into the system. In practice one will only enter a set of points separated 
by the minimum distance guaranteed by the geometry used. Retrieval works 
/-"s in a symmetric way. 
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LEAST SQUARES SOLUTION OF AN OVERDETERMINED SET OF EQUATIONS: 

Let us write the equations as follows: 

A x = y + e 

Where A is a given m by n matrix ( ,m>n), x is the unknown n- vector, 
y is a given m-vector and e is an m- vector of errors which we are 
trying to minimise, 

e T e * CA x - y) T (A x ^ y) 

= (x T A T ■< y) (Ax. y) 

T T T T T T 
= x' A A x - y A x - x A y + y y 

— e T e = x T A T A + (A T A x) T - y T A - (A T y) T + 
dx 

So: X T A T A = y T A for this to be 
A T A x = A T y 
x = (A T A)" 1 A T y 

d " T T 
-- e'e = 2 A'A 

dx x 

The diagonal elements of this will clearly be positive so we do have 
a minimum. 



/"N 
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LEAST SQUARES CURVE FITTING: 



r^ 



Suppose we have a function g(x) defined at the n points x, , x ? etc., 
and that we are trying to fit a function f(x) which depends on the 
m parameters a^ , a 2 etc. so as to minimise the sum of squares of 
errors at the points x ] , x 2 etc. Let e. be the error at point x.. 

e 1 = f(x.j) - g(x.) 

Let e be the n-vector of errors, f the n-vector of fitted values, £ the 
n-vector of defined values. Then we are trying to minimise: 

e T e = (f - 2) T (f - £) 

by varying the parameter m-vector a.. The derivative of eje w.r.t. 
to this vector must be zero (and the second derivative positive). 



df 
(f - £) — = o 
da 



df 
where — is an n by m matrix 
da 



df df 
So: £— = f — 
da da 



or written out in full 



<o 






« « 



Mfe) 






9UV 



Ofx^ 






JU(k\ \ 






fe 






flW 



To be able to solve these equations we choose a particularly simple 
form for f(x) namely 21 a^x.) = f( x .) , that is a linear one. 
In this case the terms in the matrix df/da, namely df(x.)/da. become: 



/""N 
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df 

da. J 1 

J 



So the matrices become quite simple and let us denote them by F T 



df. 
J1 da, J n 



We also note that because of the simple dependence of f(x) on the 
parameters a. we get: 



f = F a 



And we can rewrite the main equation as: 



r^ 



F T g_ = F T F a 



Since F F is square we can attempt to invert it and get: 



a = (FV F T £ 



So inverting the normal matrix allows us to solve for the parameters a. 



/"■"N 
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EXAMPLE OF FITTING A STRAIGHT LINE: 



f(x) = a-, + ap x 



and let y i = g(x.) 



Here 



a = (a r a 2 ) g_ = (y-, , y 2 , ... y n ) f-, (x)=l f 2 (x)=x 



1 1 
x l x 2 



1 
x„ 



F T F 






Let A = "Zx 2 - (Ix^ 2 . then 
.-, \ ITx 2 -Zx. 



(F T F) 



A l-Zx. 



a - (F T F) -1 F T 1 ._ • 



Ix? -Zx, 

-£x. n 



F T fl. - 



Xy, 

Ix iyi 



IXiYi 



a-j = ( Ix i Xy^ - Tx. Z x.^.) /A 
a 2 = ( -Z'x i Z. Yi + njx^) /A 



Note that this is an unsymmetrical method different from the one 
demonstrated elsewhere in this memo and not suited for fitting 
lines in a line-drawing, for example. 
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APPLICATION TO FITTING A POLYNOMIAL: 
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f(x) = la. x J 

a = (a Q , a } ... a^) 



and let y. = g(x.) 

1 = (y » y r ... y n ) 



fj(x) = X J 



1 1 



1 



i 9 ••• X 



m-1 



f X 



n 

m-1 
n 



/**"% 



F T F 



Ix, 



Tx? 



2x, rx 
r- z 



x 1 



Zx? 



m-1 



» • « *• y 



x i 



. . . Tx 



2m-l 



The "normal" matrix 



F T a 






r«rv, 



T.-X-1 ,-T 



At this stage we obtain the parameters a = (FT)" 1 F g_ 



r^ 
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FITTING EXPONENTIALS: 
*- 1 x. 



f(x,) - Z>J<»J> 



f j( x) . (,/ 



-T 



4° 




•: 


x o x l 

s l s l 


^ 


< x ° < x l 

s m-l Vl 


t 



a-(y ,y v y n .,) 



Vil 



n-1 



n-1 



(F F).. = XI (s.s.) £ Now suppose we have regular intervals x,=kT 

Let s! = s. and s'. = s. 

(F T F)n a Z(s i s i ) kT = ZI (s!s;) k - (1 - (s!s'.) n )/(l-s!s'.) 



Next take the special case: s' = e 



(2nri/n)a 



w = e 



■(2trj/n) 



(F T F)..=0 for i+jVO or n (F T F).,=n for i+j= or n 

' J 'J 



(F T F)= 



1 





1 





1 

1 





(F T F)-' 



TOO 00 

1 

10 

« 

10 



flt« 



fr T F) 



-1 



r w - ok 

Tw" lk 



^k 
^k 



Xw" 



(n-l)k 



*kl 



Ok 



w 



Ew lk y 



*k 
k 



,,(n-l)k 

w v ' y. 



fn-a^k -ak 
Where we used w v ' = w . Note: a is the discrete F.T. of g_. 



jf*\ 
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APPLICATION TO FITTING NON-HARMONICALLY RELATED SINES AND COSINES: 

Assume regular sample intervals: x. = iT . Let s. = ****!*, where 4. 
are the frequencies of the various components. The s. should be non-zero, 
positive and unique. 

f(x.) = a + Jil a. cos(s. i) ♦ 21 b. sin(s. i) 

1 U J* I J J \sl J J 



/■"N 





1 


1 


1 




1 


COS(Sn) 


COS(2Sn) 




1 


cos(s 2 ) 


cos(2s 2 ) 


r T s 


1 


C0S ( S m) 

nr 


cos(2s m ) 
v m' 







sin(s-,) 


s1n(2s-|) 







sin(s 2 ) 


sin(2s 2 ) 







S1#n ( S m) 

m 


x m 7 



cos((n-1)s,) 
cos((n-l)s 2 ) 



• •• 



cos((n-l)s m ) 

sin((n-l)Sn) 
sin((n-l )s 2 ) 



sin((n-l)s m ) 
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Now note that 2 cos A cos B = cos(A+B) + cos(A-B) 

2 sin A cos B = sin(A+B) + siri(A-B) 
2 sin A sin B = cos(A-B) - cos(A+B) and let s =0 



*•* 



(F T F) 1 .= 2Tcos(s.k)cos(s.k) = (l/2)Zcos((s,+sJk) +Hcos((s.-s.)k) 

,J K.~« 'J 'J J 1 

forO^i^m and 04j£ m 



h-l 



(FTF) i,J+m = J, COs(s i k)sin(s j k) = (1/2) 2rs1n(( Si + Sj .)k) +Zsin((s j -s.)k) 

for 4i4m and l^j^m 
*-♦ 
(F T F) 1+mjj+ni =Zsin(s i k)sin(s i k) = (1/2) 2Icos((s .-s. )k) -Zcos((s.+s,)k) 

for 1 4i^m and l^j^m 
The other terms can be found using the symmetry of (F T F). 
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»L=0 



Now Z- e jwk = (1 - e jwn )/(l - e jw ) 



= ( e J wn / 2 / e J ' w/2 ) ( e J " wn/2 - e " jwn/2 )/(e jw/2 - e~ jw/2 ) 
= e jw(n_1)/2 sin(nw/2) / sin(w/2) 
Now since cos (A) = Re ( e jA ) and sin(A) = Ig ( e^ A ) : 

ZI cos(wk) = cos(w(n-l)/2) sin(nw/2) / sin(w/2) 

= (1/2) ( sin(w/2) + sin((2n-l)w/2))/ sin(w/2) 
= (1/2) (1 + sin((2n-l)w/2) / sin(w/2) ) 
unless w =0 in which case the sum is n. 

*-» 

^.sin(wk) = sin(w(n-l)/2) sin(nw/2) / sin(w/2) 

= (1/2) ( cos(w/2) - cos((2n-l)w/2))/ sin(w/2) 
unless w =0 in which case the sum is 0. 

/r x 1 sin((2n-l)(s.+s.)/2) sin((2n-l)(s.-s .)/2) 
(FT). , = -(2 + 1— 2 + L-J ) 

' J 4 sin((s i +s j )/2) s1n((s r s.)/2) 

for 0<i4m and 0<4j^m and i7j. If i=j, the third term is 2n-l 
for i = j = 0, the second and third term become 2n-l . 

T 1 cos((s.+s.)/2) - cos((2n-l)(s.+s.)/2) 

(F F) i i+m = ~ ( ' V L ~ J + 

' J 4 sin((s. +Sj )/2) 

cos((s.-s.)/2) - cos((2n-l)(s.-s,)/2) 
sin((s :J -s i )/2) 
for O^i^m and l^j^m and i ?j. If i=j, the second term is 0. 

T 1 sin((2n-l)(s.-s.)/2) sin((2n-l )(s .+s.)/2) 
(F F). . = -(- 4 — ! ,__J — ] ) 

1+m ' J+m 4 sin(( Sj -s.)/2) sin((s. + s j )/2) 

for l<i\m and l^j^m and l^j. If i=j the first term is 2n-l. 
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r> 



^ 



The first element in this array is n, the other diagonals are near n/2, 
while most of the other terms are small, of the order of 1. The only 
large elements will be the result of two narrowly separated frequencies, 
This makes for good numerical stability when inverting (F T F) by the 
simplest methods. 

When the frequencies are harmonically related , we have s.=(2TT i/n). 
Then all off-diagonal terms will be 0, and those on the diagonal will 
be exactly n/2 except the first which will be n. The inverse of (F T F) 
then is also diagonal with the first element 1/n and the rest 2/n. We 
are back to discrete fourier transforms in this case. 

Note that if we use: 

sin(A+B) = sin A cos B + cos A sin B 

sin(A-B) = sin A cos B - cos A sin B 

cos(A+B) = cos A cos B - sin A sin B 

cos(A-B) = cos A cos B + sin A sin B 

we can obtain all the entries in the array using only a few operations 
on sin(s./2), cos(s./2) and sin((2n-l)s./2), cos((2n-l)s./2) . 



f~\ 
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SOLVING SETS OF SIMULTANEOUS LINEAR DIFFERENCE EQUATIONS: 

(x l } n + l = a ll (x l>n + a 12 < x 2 } n "" + *Mn 

^n+l = a 21 ^n + a 22 (x 2 } n - + a 9m (x m )„ 

2m v m'n 

(0*4.1 = d ml ( X l)« + a m9 (Oi **' + a mm( X m)« 

m n+1 ml in mc I n mm m n 

2^+1 = A )( where A is the given coefficient set 

Assume a solution of the form r n for each x. : 

x^ = a_ r n where a is a m- vector of parameters 

r(ar n ) = A(ar n ) (A-Ir)a = since r n f 

A non-zero solution for £ requires that det(A-Ir) = 0. The possible 
a/s are eigenvectors, the possible r's are eigenvalues of the matrix A, 
The determinant is a polynomial of degree m in r and will usually have 
m solutions, possibly complex. We get the usual problems if two roots 
coincide and have to introduce additional solutions of the form nr n , 
n r and so on. Having found r, we can solve for a using some 
normalising conditions (since any multiple will also be a solution). 
Then using the linearity of the set of equations we can add up the 
solutions into a more general one: 
r* 
x = Z- a . (r.) where r. are the various solutions 

Often we are only interested in stability and just check the roots: 



Vi\< 



1 



r^ 
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EXAMPLE FOR A TWO VARIABLE SYSTEM: 

(x 1 } n+l = a ll < x l>n +a 12 (x 2 } n 
^2^+1 = a 21 (x l>n + a 22 (x 2 } n 



/~\ 



det 



= (a 11 -r)(a 22 -r) - a-^a^ = 



( a ll " r ) a 12 
a 21 ( a 22 - r) 

^ a ll a 22 " a 12 a 21^ " ^ a 11 + a 22^ r + r ° 

r = (1/2) ( (a 11+ a 22 ) ± J (* U +* 2Z ) Z -«*tf 2 2'* }z *n? ) 
r = (V2) ( (a^+a^) ±7 < a lT a 22 )2 + 4 a 12 a 21 X > 
Stability: When is \ (a + J a 2 - 4b ^)| <t ? 



Case 1: 


4b>a^ 


Complex roots 


\a|< 2 for stabilit 


Case 2: 


4b < a 2 


Real roots 


a> 0. 


a+J a 2 -4b < 2 


Case 3: 


4b < a 2 


Real roots 


a<0. 


a<b+l 
-a+J a 2 -4b < 2 
-a <b+l 


Case 2 &3 


4b < a 2 


Real roots 




la\<b+l for stability 


Substituting: 











Case 1: 4 f a n a 22" a 12 a 21 } > (a n +a 22 } ' 

' 4 a 12 a 21 > ^ a ll" a 22^ 



la 11+ a 22 l<2 



^N 



Case 2 & 3: Opposite of above relations 



l a n H " a 22\ <1 " f -^ a n a 22"" a 12 a 21 } 
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MULTI-DIMENSIONAL NEWTON-RAPHSON ZERO-FINDING: 



Suppose we have n functions F. each of n parameters a . . We are trying 
to find values for the a.'s such that the F.'s are all zero. 

Assume we have the value a^ at step n for the parameter vector. 

This gives us the value for the function vector F = F(a ). 

— n — n 

Now consider a small change da in a_. To a first approximation we get: 



F^ + da) = FfaJ + F'CaJ da 



Where F'(a^) is the matrix of derivatives: 



dF 1 dF 1 <JF 1 

da-j da 2 da n 

-2 *2 ... fn 

da, da2 da ? 



5" -" ... fn 
da } da 2 da n 



For F^ + da) 



we need 



■ F <V 



F'(a^) da 



So 



da = 



'V" 1 F „ 



Vl 



in " ^n'" 1 F n 



So we iterate to a solution, requiring one matrix inversion per step. 
There are better methods, but few simpler. For bad hill-climbing type 
problems one can use the method of conjugate directions and various 
variations such as the so-called mixed method which is also fairly 
rapid. 
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SIMPLE INTERPOLATION OF A FUNCTION FROM A STORED GRID: 
Rectangular grid: Suppose the origin is x Q , y Q and the spacing d. 
To find an interpolated value at a point x, y calculate as follows: 
X' - (X - X Q )/d y' = (y -y o )/ d 

1 - LV] j - [y'] 



^ i = x ' - i 



3 = y' - j 



f(x,y) * f.^. (l-Ai-Aj-^iAj)+f i+1>j .Ai(l-Aj) 
*i.j + iCl-A1)Aj + f 1+lfJ+l AlAj 



Triangular grid: Again origin at x ,y 

oo 

x- = C-(x-x )--(y-y ))/d y' = (y-y )/d 



2 

1 = Ex'] 




j - Cy'] 



j = y 



A i = x' - i 
fCx,y)~ f u (l-Al-*J)tf 1+u Ai + f 1d+l A. 
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WHAT ELLIPSE IS IT: 



Given an ellipse in the form: 



Ax+Bxy + Cy 2 + Dx + Ey + F = 



Determine its center, angular orientation and major and minor axes, 



x Q = (BE - 2CD)/(B 2 - 4AC) y Q = (2EA - DB)/(B 2 - 4AC) 



x , y are the center because we can expand as follows: 
A(x-x o ) 2 +B(x-x o )(y-y o )+C(y-y o ) 2 +F'=0 
A x 2 + B xy + C y 2 + (-2Ax Q -By )x + {-2Cy o -Bx o )y 



+ (F'+Ax 2 + Bx o y o+ Cy 2 ) 



So we have: 



2Ax„ + By,, = -D 
o ^o 

Bx„ +2Cy„ = -E 
Solving this set of equations we get the above expression for x , y , 

We also now have a useful new quantity: 
F' = F -(Ax 2 + Bx o y o + Cy 2 ) 

The orientation of the ellipse is found as follows: 

tan 2 8 - B/(A-C) 
And the major and minor axes can be found as well: 

a*- -2F7((A+C)-Jb 2 +(A-C) 2 ') 

b*= -2F'/((A+C)+yB 2 +(A-C) V ) 



/ m \ 
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These last results follow from expansion after change of coordinates*. 

x cos 9 + y sin 9, -x sin 9 + y cos 9 „ 
( )2 + ( } 2 = 



/^, 



sin 9 cos 6 o 

(. - - * - 2 

b- a 



x <: „ . J 1 sin 9 cos 9 
+ — j.) x + 2 sin 9 cos 6(-£ - -j) xy + ( — *- + —) / = i 

a a b a b 



Identifying the appropriate terms with A, B, C and F' we get: 
B/(-F') 



1 1 
= sin 29 (^ - -£) 



a" b' 

1 1 
cos 29 (-j - -£) 



a" b' 



(A-C)/(-F') 

Since 2 sin 9 cos 9 = sin 2$ and (cos 9) 2 - (sin 9) 2 = cos 29 
(A+C)/(-F') 



1 1 



a" b' 



It's also clear that: 
1 1 



(-5 - -5) =7b 2 +(A-C) 2 /(+F 1 ) (Assuming a > b) 



a" b' 
The rest follows from these simple equations. 
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APPROXIMATION TO n! 

Stirling's formula: ni ^ 4 2lr n" 1 (n/e) n 

„, l n (V12n) 

Better approximation: ni *" \ 2wn* (n/e) e 

The fractional error of the latter is much smaller. For n=10 for 
example it is .27E-5 versus .8E-2 and for n=50 it is .22E-7 versus .1E-3. 
This is useful in calculating large binomial coefficients. 

OBTAINING A NORMALLY DISTRIBUTED RANDOM VARIABLE FROM ONE UNIFORMLY 
DISTRIBUTED: 

Suppose x. is the output of our random (pseudo ...) number generator. 
1. (Iv 6)/6 



2. J -2 log x. 1 sin(2Tfx. +1 } 



3. Let f(x) be the distribution we are aiming for. Now integrate it: 



F(x) = ff&) di 



A random variable distributed as desired will be (F~ )(x.) 



MULTIPLICATIVE RANDOM GENERATORS: 

x. + , = Ax. (mod p) pa large prime 

A a primitive root of p, k not a factor of p-1 

Example: p = 2 35 - 31 A = 5 k = 5 

p=2 31 -l A=7 k=5 

Additive congruential generators are better, though. 
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FAST IN-POSITION MATRIX INVERSE: 

Do 1=0(1) n-1 

com 4 a(i,i) 
a(i,i) < 1. 



Do j = ( 1 ) n-1 
a(i,j) « a(i,j) / com 
End 

Do k = ( 1 ) n-1 and 1 f k 
com 4 a(k,i) 
a(k,i) <• 0. 

Do j = ( 1 ) n-1 

a(k,j) 4- a(k,j) - a(i,j) * com 

End 

End 

End 

Note that rows and columns are never shuffled and that there will be 
matrices which while not singular will cause this procedure to fail. 

The matrix is n by n and stored in the array a(i,j) where i and j 
range from to n-1. 

GENERATING A BIT-REVERSING TABLE: 

b(0) * 
m <■ 1 

Do i = ( 1 ) ln-1 

Do j = ( 1 ) m-1 

b(j) f b(j)*2 

b(j+m) 4 b(j)+l 

End 

m < m*2 
End 
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SOME FOURIER TRANSFORM METHODS FOR IMAGES: 

Fourier transforms for images are two-dimensional and two-sided. In this 
they differ from time-series type transforms which are one-dimensional 
and often pertain to one-sided functions (impulse responses must be for 
negative values of time). 

The general formula for n-dimensions is: (Note: f and g are complex) 



9(u) ■ „ '.„,.; / f(x) e - 1 ' Ji'i dx 



1 



f(x) = -jj^ J g(u) e +i ** du 

y 

Where x is the n-dimensional source-space vector, u is the n-dimensional 
transform-space vector and g is the transform of f. For two dimensions: 

<*> *> 

— Tii iv4.\/w ^ 

dx dy 



g(u,v) - — — f J f(x,y) e -Kux+vy) 
2 TT j£o too 

f(x,y) = _L- /V%(u,v) e + ^ux^y) du dy 



-aft -oO 



Many functions of interest are rotationally symmetric and can be dealt 
with by use of the one-dimensional integrals obtained after introducing 
the polar coordinates (r, §) for(x, y)and (y», ^)for(u, v). 



g( ^ = L f(r) r J o (r D dr 

fCr) - £°9(f)r* trr)y 

Where J Q is the zeroth order Bessel function. 
Note that f and g are now real-valued. 
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This follows from: 

«\ . -1 (rP COS 4 COS ft + v^cin ,/ c-in £>', 

'd9 dr 



/ / f(r) e " 1 (r / cos / cos e + r/>sin <^ sin 6) 

Jir -i * ' ( 

r- /°° f(r) Z^ ^ C ° S ( / ' &) d6 dr 



2ir /» 



f(r) 2ff r J Q (rA ) dr 



We can apply these results to a few useful examples: 

The pillbox: f(r) =1 for r 4 R, otherwise. This is the point-spread 
function produced by defocusing. 

9(/>) - >t rJ ( ? ) dr=— jf * J Q ( X )d» --Ij^y) 

^ R "7^7 Since 4 XJ ° (X) = xj i (x) 

So the function J- [ {x)/x plays the role here that Sin x/x plays for 
one-dimensional systems. 

The gaussian: ( 

f(r) = e ' ity 

/*** »/ r \2 '2 

e " ^r J r J Q (rf ) dr = $- 2 e^ /^ 

/2 h 1^ 

e~" x J_(bx) x n+1 dx = -— e " ^ 
n (2a) n+l 



The gaussian has some interesting properties. First it is the 
only rotationally symmetric function that can be factored into 
a product of a function of x and a function of y. Secondly it is 
the only function that "transforms into itself". 
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The gaussian is also a good first approximation to point-spread 
functions in some devices(at least for small r) . 

A scatter function: f(r) = e / r 

Analysis of total reflections in the face-plate of an imaging 
device leads to an equation of this form (at least for large r). 

9(f>) = 7 e _rV J (ry») r/r dr 

Note on scaling: For the gaussian we have the following relationship: 

r H f H = 1.386 (r H = 1.177c ,P R = 1.177/ «r) 

where r H is the half-intensity radius in the source space, 
and P H is the half-intensity radius in the transform space. 
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HOW COHERENT MONOCHROMATIC LIGHT AND A LENS DO FOURIER TRANSFORMS: 

e— >* s » 




T&hM . 



ImA« 



Let f be the focal length of the lens and X the wavelength of light. 

Plane monochromatic coherent light enters from the left and passes 

through the transparency, being then focused by the lens on the image plane. 

We assume that x and u are relatively small compared to f, so that 

9 will be small. We then have for the distance that the ray has to travel 

from the point x on the source plane to the ooint u on the image plane: 

f/cos e + f cos e + x sin 9 = f(2 + G 4 /4 + e 6 /120 ... )+x(© - & 3 /6 ..) 



For small e this is approximately: 



2f + x B 



S~\ 



The phase-shift in radians is then: 



(2f + x 9) * Ztr/\ 



We can ignore the constant part of this and considering that light 
will arrive at the point u from all over the transparency we get: 



.+* 



. X u 



f 2tr1 _ 

g(u) = J f(x) e * 



~o© 



dx 



Where f(x) is the amount of light passed through at the point x. 
Now extend this to two dimensions and we finally have: 



ad oo 



, % f f 27fi (* V4 / V ) 

i(u,v) = J J f(x,y) e *f 



dx dy 



-06 -ofc 



Note that g(u,v) is complex. We can get an idea of scaling from thi: 
equation. 
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SOME HEURISTICS FOR TELLING WHAT HAPPENS WHEN YOU TRANSFORM A FUNCTION 



Source domain 



Transform domain 



Periodic 

Symmetric (about 0) 

Non-zero for finite distance 

Compact 

Sharp transitions 

Sample of f(t) 

Sum of f(t) and g(t) 

Convolution of f(t) and g(t) 

Time shift of f(t) by 

Integral of f(t) 

Differential of f(t) 



Discrete (non-zero only for some f) 

Real 

Non-zero out to infinity 

Spread-out 

Lots of high frequency components 

Periodic copies of F(w) 

Sum of F(w) and G(w) 

Product of F(w) and G(w) 

Multiply F(w) by e jV 

Divide by jw 

Multiply by jw 



These rules apply going either way in the transformation and may be 
used simultaneously. Discrete fourier transforms, for example! are both 
periodic and discrete in both domains. 
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MEASURING MODULATION TRANSFER FUNCTION USING SQUARE WAVES: 

It is very hard to produce images in which the intensity varies 
sinusoi dally. Yet such images are required in the traditional deter- 
mination of frequency response or modulation transfer function. An 
alternative is the use of s imp! e-to- produce square wave intensity 
modulated images. Then however we have to recover the transfer function 
from the measured results. 

Let t be one of the spacial dimensions and w = (2-rr)/T, where T is the 
repetition interval. The input can be analysed into: 



h-« 



r> 



f(t) = 1/2 + 2L. b(n) cos nwt where b(n) = ft/tr n)(-l) 

for n odd, otherwise 

Let the transfer function be a (to). Then the output will be: 

. 9(t) = (V2)a Q + H b(n) a(n w ) cos nut 

We can easily normalise to let a Q = 1 . Let c(w) = II b(n)a(n w ). 

The problem is to recover a(u» ) from c( w ). In the case of square-waves: 

c(w) - (2/«ir)( a(u,)-ia(3w)+Ia(5M)-ia(7 w )+Ia(9w)- 1 la(nw) ...) 

c(3*»)- (2/ir)( a(3^-Ja(9w)+Ja(15w) ...) 

c(5w)= (2/-rr)( a(5w)-Ja(15w) ...) 

c(7w)= (2/ir)( a(7w) ...) 

c(9w)= (2/it)( a(?va) .,.) 

Now add appropriate high-order terms to c(ui) to cancel out high-order 
terms of a(i*>) and get: 

a(w)»(ir/2) ( c(w)+^c(3u*)-\:(5u3) + Ic(7u,) + -! r c(nu,)-lc(13u,) ...) 
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CONVOLUTIONS OF PILL-BOXES: 



With a line: 




Clearly the convolution is 2 J R Z - r 2 % = 2 R /l - (£) 2 " for |r)<R 
This then gives us the intensity profile of a defocused line. 
Convolution of a pi 11 -box with a step: 
We simply integrate the above: 






2 v 
fT - x d dx 



1 7 s 2 

fT - x d + IT si 



«■><&: 



So we have the intensity profile of a defocused edge. 
This can be rewritten in a slightly different form using: 



sin" x + cos" x = 



We can also use this to find the convolution of two pillboxes as 

2 F( - Li) 

2 
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WHAT A DEFOCUSED EDGE LOOKS LIKE: 













rl. — 
■•1 

■•» 

>•* 














.i 





H. -1 ~t -*> ~i-5 -« -\ -} -i • -I >\ >\ »v .y -t .7 •* .t l. 



/~N 



Vertical: relative intensity, horizontal: (distance from edge/defocus radius) 
Central slope: 2/(irR), 10% to 90% distance = 1.38 R 
Derivative is (2/yrR)Jl - (r/R) ZS 
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A LINEAR THEORY OF FEATURE POINT MARKING 

A first step in many line-finding programs is a process for 
determining which points in the image are likely to be on an edge. 
This is. usually done by locating areas of rapid intensity variations. 
Various ad hoc linear and non-linear techniques of varying support 
in the image are brought into play. It would be useful to have an 
anchor point on this spectrum of possible procedures. Since a lot is 
known about linear methods we might ask what linear method applied 
to a somewhat idealised image would do the job. 

Given a function f(x,y) which is constant within polygonal areas 
in the image, we are looking for a convolution function h(x 5 y) which 
when applied to f(x,y) will be zero everywhere except on the edges. 

9(x,y) = J /f(x-x\y-y') h(x ! ,y I ) dx 1 4y' 
-•0 -ex? 

To attempt to answer this question we might start by asking what values 
we expect g(x,y) to take on the edges. Linearity considerations dictate 
that it somehow be proportional to the intensity step. In addition it 
must reflect the orientation of the step, to insure that superposition 
will work. A combination of a negative and a positive pulse will do the 
trick, provided the area under each is equal to the intensity step. 
Since the image is actually two-dimensional we will have two pulse walls 
running along each edge. 




*(*,*) 




$(■*>*) 
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Note,by the way, that the regions of uniform intensity don\ have to be 
polygonal. Now it is pretty hard to guess what form h(x,y) will take. 
A way to get a handle on this is to ask the inverse question: what 
h'(x,y) when convolved with g(x,y) will produce f(x,y) ? 



oo «a 



f(x,y) =yy g(x-x',y-y , )h , (x\y') dx' dy' 

Well, it helps to look at some simple cases first. In particular if we 
only have one contour (one closed curve made of the double pulsed wall) 
we expect to get if the convolution is about a point outside this 
contoured the intensity step if the point is inside the contour. 










A and C illustrate the above statements, while B and D are special cases 
useful for deriving equations. From D in particular we find that 
2tTr (d/dr) h'(r) = 1. This is assuming that h must be rotationally 
symmetric which is clear from the other examples. We also noted that 
convolving with the double pulse wall is just like taking the derivative. 



h'(r) 



(1/2TT) J (1/r) dr = - (l/2«ir) log r 
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Since this function also does the right thing for example B we have the 
desired result. Now we need to find h(x,y) from this. We do this by 
finding the fourier transform of h'(x,y) and noting that it must be the 
algebraic inverse of the transform of h(x,y). Since the functions are 
rotationally symmetric we get: 

H'(/>) =-(l/2tr)Jf o log r r J Q (r^ ) dr 

Integrating by parts and using J xJ (x) dx = xJ-.(x) as well asyJ^x) dx = 
J (x) we obtain: 

H'(f ) = 0/2y> 2 ) 
To obtain the transform of h(x,y) we just invert this: 

■H(p)-= 1/H'(f ) = 2Vf> 2 

When we try to inverse transform this we get into convergence difficulties 
and soon discover that we have to expand our universe to that of generalised 
functions if we expect to win, even if we use convergence factors. It then 
also becomes reasonable to guess at the answer. Consider the sequence of 
"functions" obtained by repeatedly differentiating a unit step. The first 
is a pulse at the origin, the second two pulses of opposite sign. This 
function corresponds to (d/dx) in the following sense: if we convolve 
it with a function f(x) we obtain the derivative f'(x). Similarly, the next 
member of this sequence consists of a negative, a double height positive 
and another negative pulse and corresponds to (d/dx ) and so on. 

When we try to transform these "functions" we obtain the following: 
T(d/dx) = iu, T(d 2 /dx 2 ) = -u 2 , T(d/dy) = iv , T(d 2 /dy 2 ) = -v 2 And: 



T( d 2 /dx 2 + d 2 /dy 2 ) = -u 2 -v 2 =-p 2 



2 2 2 2 
So our h(x,y) is some multiple of the laplacian ( d /dx + d /dy ), 
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One can amuse oneself by showing that the convolution of h(x,y) and 
h'(x,y) is in fact zero everywhere except at the origin as it ought 
to be: 



h(x,y)«h'(x,y) = (d 2 /dx 2 +d 2 /dy 2 )(-l/2-rr) log r 



d/dx log r = d/dx (1/2) log(x 2 +y 2 ) = x/(x 2 +y 2 ) 



d 2 /dx 2 log r =-(x 2 -y 2 )/(x 2 +y 2 ) 

2 2 2 2 2 2 
d /dy log r = (x -y )/(x +y ) by symmetry 



2 2 2 2 
(d /dx +d /dy ) log r =0 except for x=y=0 



/^ 



^S 



So log r is the function which has the surprising property of having 
a curvature at each point which is exactly opposite to the curvature 
at right angles. Next we might be interested in a discreet approximation 
to the laplacian, particularly a rotationally symmetric one( &) ; 






Now since we have all this nice linear theory ala Wiener available we 
might as well mention that if the image is corrupted by gaussian spatially 
independent noise we can apply his results to produce a least squares 
approximation to g(x,y). We then find our convolution functions more spread 
out than the laplacian and in fact they will contain a central peak surrounded 
by a larger negative depression ( O* The only problem is that only 
some part of the noise in the image satisfies the criterion, a great deal 
of it not being spatially independent and what's worse, there is no reason 
to suppose that a least squares approximation to our pulse walls would 
be at all useful. Anyway, here it is, our anchor point for the spectrum 
of feature point (or inhomogeneous) finders. 
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FAST FOURIER TRANSFORM 

Once we have bit-reversed the complex array x containing the function 
to be transformed we proceed as follows, assuming In = log ? n. 

n < 2tln 

itn <■ n/2 
igr «■ n/2 

iga < 2 

is <■ 1 

Do i = 1 ( 1 ) In 

Do ist = ( iga) n-1 

Do k = ist ( 1 ) ist+is-1 , iwb = ( igr ) 
a <• x(k+is)*w(iwb)+x(k) 





b «■ x(k+is)*' 


w(iwb+itn)+> 


:(k) 




x(k) 4- a 








x(k+is) *• b 








End 






End 








igr 


* igr/2 






is 


<r iga 






iga 


<■ iga*2 

2irai 
~"ri 







End 

Where w(a) = e 

Note that the arrays x and w are complex valued and dimension n, 



r\ 
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CONTRAST IN A RECTANGULAR CORNER: 



/O 



One of the problems in generating line-drawings from comolex scenes 
is that in addition to the contrast-reduction due to scatter in the 
imaging device there is also a great reduction in contrast due to 
mutual illumination. To get a handle on this problem, consider the 
simple case of two semi-infinite planes meeting at right-angles. The 
light is incident at an angle <* w.r.t. one of the planes. The surface 
is such thaty* of the incident light is reflected. Clearly for any 
point on one of the half-planes one half is reflected into empty space, 
the rest onto the other surface. Light incident at any point is a sum 
of the light from the source and that reflected from the other plane. 
If both planes are semi-infinite the intensity on each one will be 
uniform since a point receives an amount of light from the other plane 
that does not depend on the position of the point. 



x, 




I-, = (f» /2) I 2 + a cos* 

l 2 = ( f /2) h + a sin,( 

I, = (cos* + {p/Z) siiW) a / (1 - {f> /2) 2 ) 

I 2 = (sin* + (p/2) cos*) a / (1 - {f /2) 2 ) 



Contrast = 



h-h 



I 1+ I 2 



((/>/Z-|>cosd.-(/&-l)sintf ) 



(( / »/l+|)coso<+(W a .+Dsin< 



2f 



coscrf -sin«i 
cos<^+siny 



fS 
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Contrast 



i,-i 2 



i 1+ i 2 



= JL I tan(*--T/4)| 

2+f ' ' 



In the absence of reflection this will be \ tan(*£ -IT/4)|, so the 
contrast is reduced by a factor 

(2-f )/(2+f ) 

This factor ranges from 1/3 to 1 as p ranges from 1 to 0. So for 
objects that reflect most of the incident light, such as our white 
cubes this effect is worst, reducing the contrast by a factor 3. 

If we consider finite half-planes things get more hairy and the 
intensity on a given plane is no longer independent of position, 
falling off as one goes outward from the corner. In the corner 
itself however the situation is unchanged in the limit. So as far 
as the contrast across the edge in the image is concerned we can 
still use the above formula. Note that with finite half-planes a 
rigorous analysis would require knowledge of the distribution of 
reflected light with angle which was not needed in the above. 

If we consider other angles we find that the problem increases as the 
angle gets smaller. 

Suppose the angle between the two planes is tf/k instead oftr/2. Then 
instead of {P 12) we have (l-l/k)f . The reduction in contrast then is: 

1 - (1 - Vk)f 

1 + (1 - l/k)f 

And whenP= 1, the worst case^we have a reduction of l/(2k-l). 

It is clear that gray blocks are very much better in this respect than 
white ones. For example ify> = .5 instead of 1.0, the reduction is only 
3/5 instead of 1/3 for the contrast. 
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SCATTER IN OUR IMAGE DISSECTOR (TVC): 

A considerable reduction in contrast in our image dissector is 
caused by scatter of the incident light. This scatter goes undetected 
when one concerns oneself with the point-spread function because it 
corresponds to a very low, very wide skirt around the central blob. 
The size of the central blob is determined by the resolution of the 
device (or visa versa) and in our case has a half-intensity radius of 
around .09 mm (in the centre of the field of view). The scatter skirt 
however extends easily to the edge of the field of view 38 mm away. 
It is so low that it would go undetected due to dim-cutoff if we 
looking at point sources. Only when it is integrated over large areas 
is its effect noticable. It turns out that about 33 % of the incident 
light is scattered in this way. This causes a dramatic reduction of 
contrast. 

^ Several causes can be traced for this phenomenon. The lens contributes 
some small amount of scatter but the major defects occur because of 
multiple reflections in the face-plate and reflection from the aperture 
plate at the end of the drift-tube. It fs not known whether any electron 
optic effects come into this as well. The light enters the face plate 
and is partially absorbed by the photocathode; some light is^ however, 
reflected and may bounce repeatedly inside the face-plate. Some fraction 
of the light also passes right through the photo-cathode and strikes the 
shiny nickel (i) arperture plate only to be reflected onto the back of 
the photo-cathode. 

These problems could be ameliorated by optically coating the face-plate 
to avoid multiple reflections or to use a fiber-optic front-plate. The 
arperture plate clearly ought to be made of some more reasonable material 
(to avoid the magnetic problems) and should be fairly non-reflective. 



/*~N 



(we might expect,by the way> that the front-plate scatter is worse for 
larger iris diameter (lower f-stops) because the light will be entering 
the face-plate from larger angles relative to the optical .axis) 
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As pointed out this phenomena only occurs when we are integrating 
signals over large areas. To measure the effect, then., we have to 
illuminate large areas. One method involves the use of a series of 
white discs on a black background to be viewed by the image dissector. 
For each size disc one records the intensity at the centre. This 
method suffers from the fact that it is hard to find paper surfaces 
that have a high reflectivity (> 50%) and others having a low 
reflectivity (<10%). The observed effect is then considerably lower 
than expected? in addition, the scatter in the lens is included. 

A better method is that of removing the lens, using a point source of 
light (such as a distant lamp reflected in a metal sphere) and using the 
iris to allow variable diameter circles of light to fall on the 
photocathode. We observe in this way the integral of this scatter 
function. Let the point-spread function be rotational ly symmetric, f(r). 



F(r) = 2tf/ f(-fc) I di 

JO 



If we wish we can differentiate the observed function and get: 

2tt f(r) r 
There is some reason to suppose that f(r) can be approximated by e" rr /r 



Anyway here is an experimentally obtained curve: 



aoir 



15. 



to. 



o- •- 



VlPI-StCIX*- W/o »ts 

P6-) 




so. inin &iA*ereft 
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We can use our results to estimate the intensity at various points 
in an imaqe consisting of large polygonal areas of uniform intensity. 







Lets look at the intensity at the points Al , A2, Bl , B2, CI, C2 
assuming 33 % spill-over: 



Al (90° out of 360° illuminated) 

AO ( M n *\ M M 



) 



Bl (180° out of 360° illuminated) 



B2 ( 



\{ l| M 1* 



) 



CI (270° out of 360° illuminated) 

C2 ( " " " " " ) 

Dl 
D2 



1. - .33 (3/4) = .75 

0. + .33 (1/4) = .08 

1. - .33 (1/2) = .84 
<L + .33 (1/2) = .18 

1. - .33 (1/4) = .92 
0. + .33 (3/4) = .25 

1. 
0. 



/"> 
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WHY THE VIRTUAL IMAGE OF A POINT-SOURCE LOOKS EQUALLY BRIGHT FROM 
ALL DIRECTIONS: 




Consider the small surface ring where the light is incident at 
an angle 9 w.r.t to the surface normal. 

The incident area is: 2irr 2 sin 8 cos 8 de =irr 2 sin 29 d9 

Light falling into this ring is reflected at an angle 28 w.r.t 

to the incident ray and with a spread 2 d9. At the distance R, 

the light reflected from the ring is spread into an area 

2 
2 R sin 29 2 d9. The intensity per unit area at distance R is: 

I (tr r sin 2% dfr)/(2irfT sin 2fr 2 dfr) = I (r/R) 2 / 4 

So its independent of what angle one is looking at it from. 

This has implications for reflectivity models of surfaces made of 

spherical particles. It is also useful in producing point-sources 

with very small source areas (we are assuming both source and observer 

distant from the sphere). Note that the factor of 4 comes from the 

2 
fact that the incident islfr , while the light is reflected into 

2 
and area 4-rTR . 
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GLOB-TRACKING: 

Suppose we have an intensity glob such as a ping-pong ball against 
a dark background. The object is to track it using the random access 
camera. Define a two-dimensional pattern of points. The spread and 
position of this pattern will be servoed using the intensities read. 

At each step input the intensities, find their maximum and minimum, 
IMAX and IMIN. If IMAX is too small, go into search mode, otherwise 
calculate the following sums 

51 x. i. 2Ty. i. Z i. 
it ii i 

Then adjust the position: 



21 Y. I, _ 




Then adjust the size of the pattern: 

IMAX 1 - IMIN ' 

M+ ' * IMAX -IMIN 

Where (IMAX 1 - IMIN') is the desired state of intensity range, 

Usually *,>$!. eg 0, =1/2 6 L = 1/8 




**,y, 
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A RELATED ANALOG TYPE GLOB-TRACKER 




Here g(t) is some test function like cos (wt), for example, and f is the 
external function such as intensity. An interesting case is obtained if 
we combine two of these circuits, one for x and one for y coordinates in 
an image dissector camera. We then have a star-tracker. The two g(t)'s 
will need to be "orthogonal" then, cos(wt) and sin(wt), for example. 



A similar circuit or equivalent program can be used for light-pen tracking, 
Interesting variations concern the question of whether the low pass filter 
can be eliminated or replaced by some other device and whether g(t) can be 
removed or "self-generated". In other words one aims at a system that is 
self-contained and samples the image in a way dependent on what is in the 
image rather than some fixed predetermined pattern. 
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THE SURVEYORS MARK AND FRIENDS: 



To track an object using the image dissector camera it is desirable 
to have to read the intensity at as few steps as possible at each time 
interval. The pattern to look at must also be designed for three conflicting 
requirements: ease of acquisition, ease of tracking in fast motion and 
accuracy of locating when stationary. The first two cause the object to 
be fairly large, the last requires that some point on it be well defined. 
The program should have no difficulty in processing the intensities 
read and should be fairly independent of distance and orientation of 
the pattern. A radially symmetric pattern with black and white areas 
seems suitable. In particular^ one consisting of a number of intersecting 
lines with alternate segments filled in black and white seems a winner. 
The one used by surveyors uses two lines, our robotics calibration 
programs use three-line patterns. 





The image processing is simple. One reads the intensity at a number of 
points on the circumference of a circle, finds the maximum and minimum 
and sets up hysteresis thresholds. The lines are detected at the points 
where the intensity crosses both thresholds in sequence. The six points 
define three lines. The centre is then estimated to be near the point of 
minimum sum of squares of perpendicular distances to the three lines. 
Image motion between succesive scans can be almost the radius of the 
pattern, while its centre can be located extremely accurately by shrinking 
the sampling circle. 



/"*\ 
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OBJECT ROTATION MATRIX: 

Consider an object rotated first about the x-axis (pitch, p), then 
about the y-axis (yaw, y) and finally about the z-axis (roll, r). 
We are interested in the corresponding transformation matrix: 



cos 



r cos y (cos r sin y sin p - sin r cos p) (cos r sin y cos p + sin r sin p\ 



sin r cos y (sin r sin y sin p + cos r cos p) (sin r sin y cos p - cos r sin p) 



sin y 



cos y sin p 



cos y cos o 



STEREO IMAGE PROJECTION 



# (*-**> 




Left eye: x' = (x+s)f/z y 1 = (y)f/z 
Right eye: x" = (x-s)f/z y" = (y)f/z 



Projection of point (x,y,z) 



f is the distance the resulting images are to be viewed from. 
2s is the eye separation. 
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3 
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4 


11 


5 


15 


6 


18 


7 


23 



EXPOSURE GUIDE FOR OUR DEC 340 DISPLAY 



(1+D 3/2 



f • - f*- number indicated on lens. 

k - empirically found to be about \ /\15 (gives rise to density of 

about 2 in negative; i.e. almost overexposed). 
r. - Half- intensity radius of spot on DEC 340, varies somewhat with 1. 

use .5 mm unless you have good reason to suspect other value. 
r 2 - Half-intensity radius of blur in camera projected back onto 

display surface - varies with lens and film used. - 

use .5 mm unless you have good reason to suspect other value, 
r^ - Spacing of points in image you are displaying. Use GO if all the 

points can be resolved in the image. 

s I 

Use .25 mm * 2 for vectors, increments anq characters of scale s. 

s - scale send to DEC 340, 0-3. 
/""V A -- ASA rating of film. 

For polaroid B/W: 3000 

For 35mm TRI-X : 300 

For 16mm TRI-X : 200 - 

N - Argument to .NDIS ; i.e. number of times points are displayed. 
P - Packing factor. 

1 for resolved points. 

max(l, -i ,1 A ) for one-dimensional sets of points (vectors, increments, 
3 characters) 



r 4-r^ 2 
max(l, ( J' #) ) for two-dimensional sets of points (rasters). 

r 3 
Filter factor. 1 for no filter, 2 for Wratten 15 (afterglow only), 

8 for Wratten 47 (flash only). 
Intensity parameter send to. scope. If varying intensities are to 
be recorded, use 1=5 - highlights will be slightly overexposed but. 
the dark-areas will not be completely under-exposed. 0-7# 
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SOME LENS FORMULAE: 



IM*flfc 
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Let P 1 be the front principal plane, be ? z the rear principal plane. 
Let f be the focal length and the media on the two sides of the lens be 
the same. Let fj be the object-lens distance and f g the lens-image 
distance. 



The de-magnification of the image is then: 
f. 



M 



1 



We know that: 



1 1 1 



i.e. (^ - f)(f 2 - f) = f 2 



f 1 = (1 + M)f 
f 2 = (1 + VM)f 



Let d be the object to image distance (ignoring thick lens effect): 



d = f 1 + f 2 = f (M + 2 + 1/M) = f 



(1+M)' 



M 



Let x = ( 1) 

2f 



then 



IT - 2,x M + 1 = 



m 2 + i 1 

x = 2— LI » -(M + 1/M) 

2M 2 



M 



= (x-1) +/ 



x 2 -l 
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dd 1 M 2 - 1 df, df, 

- = f(l - - 2 ) = f ( *-) -1 = f -J - -f/M 2 

dM VT hT dM dM 



df ] M 2 df 2 -1 df 1 2 

dd M Z - 1 dd M* - 1 df^ 

These formulae- are useful for calculating focusing accuracy for example. 

1 



Numerical arperture is defined as ft sin (6/2), the f-stop as - 

2 sin(9/2) 

Where «- is the angle subtended by the lens at the centre of the image. 
The intensity at the image is proportional to l/(f-stop) 2 . 



n, 




r v (Radius of curvature) 
Then we have the lens-makers equation: 



1 


1 




1 








1 


1 


«-» 


+ •» 


= 


«- 


= 


(n 2 • 


-"1> 


(-* ' 


- "■") 


f l 


f 2 




f 




r l 


r 2 



Optimal pin-hole radius (for diffraction to equal hole spread): 

r = •/dV where d is the hole-image distance, Xthe wavelength 

Airy radius: 2 (3.8317/(2* )) > (f-stop) = 1.22 X (f-stop) 

(Since 3.8317 is the first zero of J-|(x)/x ) 



^ m \^ 
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FOCAL LENGTH CALIBRATION: 



For accurate camera models one needs good measurements of the focal 
length and the position of the principal, planes. 



IMAGE 



Uews 



o&^ecT 



■f. 




Now 



x = f ] + a, y = f 2 + b and (1/^) + (l/f 2 ) = (1/f) 



We measure several combinations of x. and y. and attempt to find 
a, b and most important, f. We can assume that a and b are relatively 
small relative to f and that f is known approximately . We clearly 
require three such sets of measurements and could use least-squares 
methods if we had more. Unfortunately the equations are non-linear. 
We can make them into polynomials in a, b and f however: 

l/(x r a) + l/(y r b) = 1/f 

((x.j+y.) - (a+b)) f - (x.-a)(y r b) = 

-(ab+bf+fa) + (x.(f+b) + y.(f+a)) -x.y. = 

We could solve this set of second order polynomials in 3 variables 
in a number of ways. Perhaps the easiest is multi -dimensional Newton- 
Raphson iteration. We consider this last expression as a function F 
of the parameters a, b and f and are aiming for F(a,b,f) = 0. For this 
we require the derivatives: 

dF/da = y. - b - f , dF/db = x. - a - f, dF/df = (x.+y i )-(a+b) 

We can also use guessing or a least squares method. If we can select 
the x.. and y.. we can also simplify the problem. 
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Special Case: If we can choose x. and y. we might try the following: 

y 1 = oO x-j = f+a a = x-j-f 
x 2 = CO y 2 = f+b b = y 2 -f 

We need one more measurement: 

l/(x 3 - X] +f) + l/(y 3 - y 2 +f) = 1/f 

Let x 3 -x 1 = x 1 , y 3 -y 2 = y' 

(y'+f + x'+f) f - (x 1 + f)(y' + f) = 
(x'+y')f + 2f 2 - x'y' -(x'+y')f -f 2 = 

-x'y' + f 2 = f =/x 7 y ri 
f =7(x 3 -x 1 )(y 3 -y 2 )' 

For accuracy we want both differences large, this implies that 
we want x 3 about the same magnitude as y 3 . 



r\ 
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DETERMINING THE TRANSFORM FROM ARM TO EYE SPACE: 



Being a rotation and translation we expect: 



y. 



a ll a 12 a 13 



a 21 a 22 a 23 



a 31 a 32 a 33 





x a 




a 14 




^a 


4- 


a 24 




z a 




a 34 



And the matrix ought to be orthogonal (i.e. A T A = I ). The coordinates 
with a-subscripts are arm coordinates, those with a v-subscriot are 
eye coordinates. By allowing the matrix to be non-orthogonal we can 
absorb some of the distortions and non-linearities. In any case 
forcing it to be orthogonal introduces a non-linear constraint that 
messes up the mathematics ! We then have to use iterative methods 
V^ell -known in the art of reducing aerial photographs. 

Next we have to consider the projection into the image plane; 

u * (x v /z v W + u Q 
v = Cy Y /z v }/& + v Q 

&- and /3> are normally the same more or less and depend on the focal length 

and the translation from image coordinates to deflection units, u and v 

o o 
are zero if we choose the image origin on the optical axis which may at 

times be convenient. It is not hard to show that tf,/S, u and v can be 

/ o o 
absorbed into our first transform and we can consider the simpler case: 



u = V z v 



and 



v = V z v 



Tfiis does make the matrix non-orthogonal however. Clearly, multiplying 
all the a.. 's by any factor causes no change in the image coordinates 
and we can therefore choose a fixed value for one of them, say a 34 = 1. 
We then have the problem of determining the values of the other 11 terms. 
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We need at least 11 equations then and preferably more so as to allow 
a least-squares solution. One method of determining the transformation 
matrix depends on moving the arm into n known positions and recording the 
corresponding x . , y . , z . and image coordinates u. and v.. It is 

a I a I a 1 11 

convenient to track a special object held in the hand as it moves around 
rather than to blindly move the hand and try and locate it in the image. 

For each such measurement we get 2 equations: 




x v - z v u i = ° 



and 



y v - z v v, = o 



a n x ai +a 12y a T +a 13 z ai +a 14 



- a 3lVar a 32Var a 33 u i z ar u i a 34 =0 



a 21 x ai +a 22yai +a 23 z aT +a 24- a 31 v i x ar a 32 v iyai" a 33 v i z ar v i a 34 =0 




/*> 



For n such measurements we get 2n such equations which can be separated into 
two groups and written in matrix form as follows: 



x al *al z al 1 -u^ -u^ -u^ -u. 



^a2 *a2 *a2 



1 -u„x 



2 x a2 "Va2 " u 2 z a2 " u 2 



x an ^an z an ] ° ° ° ° "Van "Van "Van " u n 
° ° ° x al ^1 z al ] " v l x al "Val "Val " v l 



x g2 y a2 z a2 i -v,x,, -v y, -v„z,„ -v. 



1 " v 2 x a2 " v 2^ y a2 * v 2^a2 



Ox y z 1-vx -v y -v z -v 
an ^an an y n an Van v n an v n 



l ll 



*12 



"34 



^ 
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Setting a^ A to 1 and taking the resulting constant terms (u, .. . u , v, ... v ) 
04 l n 1 n 

to the right hand side we obtain 2n equations in 11 unknowns. We can make 

do with 5 1/2 experimental measurements or attempt a least-squares solution 

for n^» 6 points. Not more than 3 points should be in any one plane in the 

first instance to avoid degeneracy. It is convenient to use the points at 

the tips of an octahedron. 

Notes: 1. A slightly different formulation leads to 18 equations in 18 
unknowns; the same results are obtained. (This corresponds to 
the homogeneous representation.) 

2. If we had assumed orthogonality we would have introduced 3 
more constraints and needed only 8 equations, that is 4 
experimental points which could conveniently be the corners 
of. a tetrahedron. 
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RELATION BETWEEN THE SIMPLIFIED AND THE REAL IMAGE COORDINATES: 



Given: 



a ll a l2 a 13 




x a 




a 14 


a 21 a 22 a 23 




*a 


+ 


a 24 


a 31 a 32 a 33 




z a 




a 34 



and 



u = V z v 



and 



v = V z v 



Where x a , y a , z fl are coordinates relative to the arm coordinate system. 
x v» y v 3 z v are cooircl1 ' nates relative to the eye and u and v are image 
coordinates in the simplified system. 

Now we introduce the real image coordinates: 

u'=u«< + u and v'=v/2+v 

<* x v " Cu'-u )z v = and /3 y y - ( v '-v ) = 

C^a 11 +u a 31 )x a+ (o<a 12+ u a3 2 )y a+ (o<a 1 3 + u o a33)z a+ (o<a 14+ u a3 4 ) 

- u ' (a 31 x a +a 32V a 33 z a +a 34 )=0 
( A a 21 +V o a 31 Vty* a 22 +v o a 32> Vty* a 23 +v o a 33^ z a + (/ 2 a 24 +v o a 34> 

- v ' (a 31 x a +a 32V a 33 z a +a 34 )=0 
So finally: 



\ 




*; 




K 


= 


^ 





{ **U +u o a 3^ ^ a 12 +u o a 32 ) ^a 13 +u a 3 3) 
( /* a 2.1 +v o a 31 } (/ ia 22 +v a 32 ) ^^V^ 



( a 



31 



) ( 



*32 



) ( 



'33 



(rfa 14 +u o a 34 ) 
( / 2a 24 +v o a 34 ) 



( a 



34 



) 
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A VERTICAL PREDICATE FOR IMAGE LINES: 

In a perspective transformation of the world a set of parallel lines 
will project into a bundle of lines passing through one point, the 
vanishing point x f , y f . 

If we simply assume that vertical means parallel to the arm's z-axis: 

Then as z a ^ oo , Xy -> a^ , y y -, a 23 z a , z y -* a 33 z a 

And so x f = a 13 /a 33 and y f = a 23 /a 33 

In practice vertical means perpendicular to the table. Suppose the table 
equation is given by: 

P]X + P 2 y + p 3 z + p 4 = 
Then let x g = *>< p 1 , y a = <*p 2 , z fl = o<p 3 and let ^ 4 oo . 

x y > ^(a 1]Pl + a ]2 p 2 + a 13 p 3 ) 
y v * * (a 21 P! + * 2 2 p 2 + ^3* 
z v * *( a 31 p l + a 32 D 2 + a 33 p 3 ) 

x = a llPl + a 12 p 2 + a 13P3 

a 31 p l + a 32 D 2 + a 33 D 3 

y = a 21 p l + a 22 p 2 + a 23 p 3 

a 31 p l + a 32 D 2 + a 33 D 3 

To test if a line is vertical or near vertical we calculate the angle it 
makes with the line connecting it to the vanishing point: 

x 12 = x i " x 2 ' x lf = x l " x f » y 12 = y l ~ y 2 ' y lf = y l ~ y f 
Csin ©) 2 = (x lf y 12 - x 12 y ]f ) 2 / (x 2 f + y 2 f )(x 2 2 + y 2 2 ) 
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GOING FROM IMAGE COORDINATES TO ARM SPACE COORDINATES: 

Clearly we need some extra information to make up for the lack of 
one dimension. But first lets look at what we have: 

u = x y /z v and v = y v /z v 

x v - u z y « and y y - v z y = 

Ca n- ua 31 lx a +Ca 12- ua 32 ) V (a 13" ua 33 ) V (a 14- ua 34 ) 
(a 2r va 31 )x a +(a 22" va 32 ) V (a 23- va 33 )z a =(a 24- va 34 ) 
We need a third equation in x a , y, and z, to be able to solve. We could, 

a a a ' 

for example, be given any one of these three coordinates. More likely is 
the case where we have some relation to the table. Let the equation of 
the table be given by: 

P]X + p 2 y + p 3 z + p 4 = 

If the point is on the table we can simply use this equation. 

If the point is in the same plane parallel to the table as some other 
known point x-j , y^, z-. then we use the equation: 

PlV P2V P 3 Z a = Pl x l + P2^1 + P3 Z 1 

If the point is directly above (along a line normal to the table) some 
other known point x 2 , y 2 , z 2 then we have: 

(x a -x 2 ) p 3 - (z a -z 2 ) p, = 

(y a -y 2 ) p 3 - (z a -z 2 ) p 2 = 

f*^ We only need one of these and will use the first since it comes out more 

accurately with the eye-arm geometries we use. 
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TYPICAL ARM - EYE TRANSFORM 



6VE 




Suppose we have the above simple geometry. Then: 



V 


1 







x a 







-sin 6 





cos e 




^a 


+ 


sin x„ 




-cos 6 





-sin 6 




2 a 




cos x + ? 



SHouct^^ 



Now we change to real image coordinates: 



So we get: 



^n+Vai) ^ a 12 +u o a 32> ^ a 13 +u o a 33 ) 
(£ a l2 +v o a31 > (/ ia 22 +v o a 32 ) ^ a 23 +v o a 33^ 



l 31 



) ( 



'32 



) ( 



-ilcos 6 
o 



■u sin 8 



■yisin 6 - v Q cos 6 /Icos 9 - v sin 9 
•cos -sin 9 



l 33 



^ a 14 +u o a 34 ) 
Y 4a 24 +v o a 34 ) 



( 



l 34 



) 



y. 



u Q (cos 9-x +f) 
y£in x o +v Q (cos 9 x Q +X) 
cos 9 x Q +t 
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Now suppose 9- = 30 , cos 9 = .86... , sin 9 = .5, u =v =512. (Center of 

coordinates for the image dissector on a scale of - 1024.) 

With a lens of 10" focal length we find that <X =3150 units/radian. 

With a lens of 6.5 " focal length oc =2000 units/radian. 

(Assuming about 12.5 units per mm on the photocathode) 

Next suppose x Q = 30.0", f = 50.0" and use of the 6.5" lens. 



-440. 


2000. 


-256. 




39800 


144Q. 


0. 


1464. 




69800 


-.86 


0. 


-.5 




76 



Next we normalise by setting a 34 = 1 



-5.8 


26.3 


-3.38 




525 


-19.0 


0. 


19.3 




918 


.0113 


0. 


-.0066 




1 



/*S 
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ABSOLUTE ORIENTATION: 



V, 




X P 




* X P 




x o 


Y P 


= R 


A" P 


+ 


y 


Z P 




* 2 p 




z o 



Where R is an orthogonal rotation matrix. tf^Y are scale factors, 
often equal to one another. x q5 y Q5 z Q is a displacement vector. There is 
one of these equations for each point in the object. 



Let x,j - X, - Xj and x (J - x, - Xj , 



then: 



x iJ 




« x 1j 


v ij 


=■ R 


a y,j 


z ij 




» *u 



Multiplying this equation by its transpose we get: 



[X 13 Y,j ZIJ] 



1J 

Y u 

z ij 



r* x u ^ij * z iji rTr 



«x.. 

^ y ij 



ij 



Now noting that R R = I we get: 
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r^ 



i^**S 



r^. 



(x ij + Y ij + z ij> ■ (« 2 * 2 - + * 1Z 



2.2 



2_2 



Ay-- + rzf.) 



Now suppose we are given four points in each coordinate system: 



va 12 T , 12 T z. 12 y 
(X| 3 + y| 3 + Z^) 

< X 34 + Y 34 + Z 34> 



2 2 2 

x 12 y 12 z 12 

2 2 2 

x 23 y 23 z 23 

2 2 2 

X 34 y 34 z 34 



It is now easy to solve for *, /J, X . Let x! . =*x. . and so on: 



v 12 
( 23 



'34 



x-! 



12 y 12 ^12 
X 23 y 23 z 23 
x 34 y 34 z 34 



11 
*12 



13 



Let X be the vector [x 12 X 23 X 34 "J T and similarly for Y and Z. 

Let x' be the vector [xj 2 x£ 3 x 34 ] T and similarly for y' and z 1 

Combining three equations like the above: 

(X Y Z) = (x' y' z') R T 

R T = (x' y' z')' 1 (X Y Z) 



x 12 y 12 z 12 
X 23 y 23 z 23 
x 34 y 34 z 34 



-I 



X 12 Y 12 Z 12 
X 23 Y 23 Z 23 
X 34 Y 34 Z 34 



Due to measurement. inaccuracies R determined this way may not be orthogonal, 
one can if one wishes adjust it iteratively using Newton-Raphson: 
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or 



VI 



Vl 



* n - .5 ( R; R - I ) 
,5 ( ( Rj ) - 1 + R n ) 



Finally we have to find the displacement vector: 



x o 




X P 




" x p 


*0 


= 


y p 


- R 


fly p 


z o 




Z P 




H 



We can get four estimates from this which we can average if necessary. 



If * = ^ = y we can get away with only three points. 
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GEOMETRY OF THE AMF-VERSTRAN ARM WITH THE ALLES HAND: 

(ROLL) #JS 



"VERTICAL" 



LO 


2.75" 


L0.5 


1.0" 


LI 


3.625" 


LI. 5 


1.0" 


L2 


10.5" 


L2.5 


.75" 


L3 


4.75" 


L4 


6.375" 


L5 


.25" 


L6 


2.0" 


L7 


2.75" 


L8 


.56" 



LI 



t 



(YAW) 






SWING 



LO 



t 



"HORIZONTAL" 



L2 



L3 



I 




ROTATE 



4Ti 



S L2.5 



a 



TILT 



L6 



L7 



a 



$ EXTEND 



1 



L8 
wide 



m fu 



<-GRIP 



L4 



•70- 
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COMPENSATION FOR GRIP MOTION: 

4* 




Je-(*4*f 




\A-Ao\/l- 



The geometry of the grippers is equivalent to the above. We then find 
a motion along the axis of the grippers w.r.t, the most extended: 



t( i - /i - ( 



/ 



2 t 



v 



COMPENSATION FOR TILT MOTION: 

When the tilt-axis is inclined % w.r.t vertical one can adjust the 
horizontal extend by sin * hand-extend and the vertical motion by 
cos 9 * hand-extend. 



On the whole, the arm geometry is yery simple and allows direct 
determination of joint angles and extensions given a desired hand 
position and orientation, keeping in mind that one only has 5 degrees 
of freedom. 



r^ 
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CONVERSION FROM RECTANGULAR COORDINATES TO PSEUDO-POLAR: 



^ m ^ 



The AMF arm has an offset in its otherwise extremely simple geometry: 




Hfc*»& 



Given x, y we need to find R and <^ . 

r = /77777 x 



SM<WU> t ft. 



2 2 2 

x + y > / 



tan ol, = x /y 



tan <*, =1/ tan (tf.-o/j) = 



tanoC^ = r/R 
yR + xr xy + rR xy + rR 



xR - yr 



1 2 

x - r 



r-^7 



Here we cannot avoid the use of arctan because we actually need the angle, 
Note that for the AMF arm r = 2.75" . 

We used the fact that x 2 - r 2 = R 2 - y 2 

And that tan(a-b) = (tan a - tan b) / (1 + tan a tan b) 



o* 



R is the "horizontal" extend and «^ is the "swing". 
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MAINTAINING A CONSTANT HAND ORIENTATION 




Let the normal unit vector to the plane containing the two fingers be (a,b,c) 



'cos** sin<* o\ /cos0 sin$\ A 

(a b c)=(-sin^ cos* )( 10 If cos£ sin^ 

vO l/\sin£ cos#/\p-sin^ cos 

cos si sin 6 cos <p + sin <* sin y 
-sin*( sin $ cos </> + cos^ sin ^ 
cos fy cos d> 






1 



ej 2 2 2 

9 y keeping in mind that a +b +c =1 

Most commonly we would be given the swing, **> , then: 
sin^ =(a sinot + b cos o< ) 



tan 8 =(a cos o< - b sin oL )/c 



f~S 
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MEASURING THE INERTIA OF A LINK IN AN ARM: 



S~\ 



For fast arm motions one requires a good dynamic model of the arm, 
including the geometry of joints and motor torques. Also required is 
the approximate moment of inertia of the links in the arm. It is 
usually not feasible to calculate these because of the complex shape 
and number of parts a link is made of. A simple empiri cat method 
requires one tq v me^sure the total mass and distance of the centre of 
gravity from K the connection to the preceeding link as well as the 
period of oscillation when the "LTnfc" Is suspended from this connection, 




Let 



m = mass of link, 1 = distance of e.g. from preceeding connection 
g = acceleration due to gravity (9.8 meter/second 2 ) 
iod of oscillation, I = moment of inertia 



8 = -(mgl/I) 8 



T = peri 
I 9' = -mgl sin 8 
8 = A cos{JmgT/P )t = A cos (2iTt/T) 
T = 2iryi/(mgl)' 
I = mgl (T/2-rr ) 2 



r\ 



Example: Pendulum made of string and heavy weight: T = 2rrJ{l/g) % 
I - mgl (1/g) = ml 2 



